I am trying to separate these value into ID, FullName and Phone. I know we can split it by using java split function. But is there any other ways to separate it? Values:
1 Peater John 2522523254
10 Neal Tom 2522523254
11 Tom Jackson 2522523254
111 Jack Smith 2522523254
12 Brownson Black 2522523254
I tried to use substring method but it won't work properly.
String id = line.substring(0, 3);
If I do this then it will work till 4th line, but other won't work properly.
If it is fixed length you can use String.substring(). But you should also trim() the result before you try to convert it to numeric:
String idTxt=line.substring(0,4);
Long id=Long.parseLong(idTxt.trim());
String name=line.substring(5,25).trim(); // or whatever the size is of name column.
You can use regex and Pattern
Pattern pattern = Pattern.compile("(\\d*)\s*([\\w\\s]*)\\s*(\\d*)");
Matcher matcher = pattern.matcher(content);
if (matcher.find()) {
string id = matcher.group(0);
string name = matcher.group(1);
string phone = matcher.group(2);
}
package Generic;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main
{
public static void main(String[] args)
{
String txt=" 12 Brownson Black 2522523254";
String re1=".*?"; // Non-greedy match on filler
String re2="(\\d+)"; // Integer Number 1
String re3="(\\s+)"; // White Space 1
String re4="((?:[a-z][a-z]+))"; // Word 1
String re5="(\\s+)"; // White Space 2
String re6="((?:[a-z][a-z]+))"; // Word 2
String re7="(\\s+)"; // White Space 3
String re8="(\\d+)"; // Integer Number 2
Pattern p = Pattern.compile(re1+re2+re3+re4+re5+re6+re7+re8,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
int id = Integer.parseInt(m.group(1));
String name =m.group(3) + " ";
name = name+m.group(5);
long phone = Long.parseLong(m.group(7));
System.out.println(id);
System.out.println(name);
System.out.println(phone);
}
}
}
What about this:
int first_space;
int last_space;
first_space = my_string.indexOf(' ');
last_space = my_string.lastIndexOf(' ');
if ((first_space > 0) && (last_space > first_space))
{
long id;
String full_name;
String phone;
id = Long.parseLong(my_string.substring(0, first_space));
full_name = my_string.substring(first_space + 1, last_space);
phone = my_string.substring(last_space + 1);
}
Use a regexp:
private static final Pattern RE = Pattern.compile(
"^\\s*(\\d+)\\s+(\\S+(?: \\S+)*)\\s+(\\d+)\\s*$");
Matcher matcher = RE.matcher(s);
if (matcher.matches()) {
System.out.println("ID: " + matcher.group(1));
System.out.println("FullName: " + matcher.group(2));
System.out.println("Phone: " + matcher.group(3));
}
You can use a StringTokenizer for this. You won't have to worry about amount of spaces and/or tabs before or after your values, and no need for complex regex expressions:
String line = " 1 Peater John\t2522523254 ";
StringTokenizer st = new StringTokenizer(line, " \t");
String id = "";
String name = "";
String phone = "";
// The first token is your id, you can parse it to an int if you like or need it
if(st.hasMoreTokens()) {
id = st.nextToken();
}
// Loop over the remaining tokens
while(st.hasMoreTokens()) {
String token = st.nextToken();
// As long a there are other tokens, you're processing the name
if(st.hasMoreTokens()) {
if(name.length() > 0) {
name = name + " ";
}
name = name + token;
}
// If there are no more tokens, you've reached the phone number
else {
phone = token;
}
}
System.out.println(id);
System.out.println(name);
System.out.println(phone);
Related
I would like to extract The Name and Age from The Text file from it. Can someone please provide me some help?
The text content :
fhsdgjfsdk;snfd fsd ;lknf;ksld sldkfj lk
Name: Max Pain
Age: 99 Years
and they df;ml dk fdj,nbfdlkn ......
Code:
package myclass;
import java.io.*;
public class ReadFromFile2 {
public static void main(String[] args)throws Exception {
File file = new File("C:\\Users\\Ss\\Desktop\\s.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
String st;
while ((st = br.readLine()) != null)
System.out.println(st.substring(st.lastIndexOf("Name:")));
// System.out.println(st);
}
}
please try below code.
public static void main(String[] args)throws Exception
{
File file = new File("/root/test.txt");
BufferedReader br = new BufferedReader(new FileReader(file));
String st;
while ((st = br.readLine()) != null) {
if(st.lastIndexOf("Name:") >= 0 || st.lastIndexOf("Age:") >= 0) {
System.out.println(st.substring(st.lastIndexOf(":")+1));
}
}
}
You can use replace method from string class, since String is immutable and is going to create a new string for each modification.
while ((st = br.readLine()) != null)
if(st.startsWith("Name:")) {
String name = st.replace("Name:", "").trim();
st = br.readLine();
String age="";
if(st!= null && st.startsWith("Age:")) {
age = st.replace("Age:", "").trim();
}
// now you should have the name and the age in those variables
}
}
This will do your Job:
public static void main(String[] args) {
String str = "fhsdgjfsdk;snfd fsd ;lknf;ksld sldkfj lk Name: Max Pain Age: 99 Years and they df;ml dk fdj,nbfdlkn";
String[] split = str.split("(\\b: \\b)");
//\b represents an anchor like caret
// (it is similar to $ and ^)
// matching positions where one side is a word character (like \w) and
// the other side is not a word character
// (for instance it may be the beginning of the string or a space character).
System.out.println(split[1].replace("Age",""));
System.out.println(split[2].replaceAll("\\D+",""));
//remove everything except Integer ,i.e. Age
}
Output:
Max Pain
99
If they can occur on the same line and you want to use a pattern don't over matching them, you could use a capturing group and a tempered greedy token.
\b(?:Name|Age):\h*((?:.(?!(?:Name|Age):))+)
Regex demo | Java demo
For example
final String regex = "\\b(?:Name|Age):\\h*((?:.(?!(?:Name|Age):))+)";
final String string = "fhsdgjfsdk;snfd fsd ;lknf;ksld sldkfj lk \n"
+ "Name: Max Pain\n"
+ "Age: 99 Years\n"
+ "and they df;ml dk fdj,nbfdlkn ......\n\n"
+ "fhsdgjfsdk;snfd fsd ;lknf;ksld sldkfj lk \n"
+ "Name: Max Pain Age: 99 Years\n"
+ "and they df;ml dk fdj,nbfdlkn ......";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
Output
Max Pain
99 Years
Max Pain
99 Years
I have a multiline String as below,I want to lift 'VC-38NN' whenever String line contains 'Profoma invoice'. My code below still prints everything once the search string is found.
Payment date
receipt serial
Profoma invoice VC-38NN
Welcome again
if(multilineString.toLowerCase().contains("Profoma invoice".toLowerCase()))
{
System.out.println(multilineString+"");
}
else
{
System.out.println("Profoma invoice not found");
}
Here are two possible solutions:
String input = "Payment date\n" +
"receipt serial\n" +
"Profoma invoice VC-38NN\n" +
"Welcome again";
// non-regex solution
String uppercased = input.toUpperCase();
// find "profoma invoice"
int profomaInvoiceIndex = uppercased.indexOf("PROFOMA INVOICE ");
if (profomaInvoiceIndex != -1) {
// find the first new line character after "profoma invoice".
int newLineIndex = uppercased.indexOf("\n", profomaInvoiceIndex);
if (newLineIndex == -1) { // if there is no new line after that, use the end of the string
newLineIndex = uppercased.length();
}
int profomaInvoiceLength = "profoma invoice ".length();
// substring from just after "profoma invoice" to the new line
String result = uppercased.substring(profomaInvoiceIndex + profomaInvoiceLength, newLineIndex);
System.out.println(result);
}
// regex solution
Matcher m = Pattern.compile("^profoma invoice (.+)$", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE).matcher(input);
if (m.find()) {
System.out.println(m.group(1));
}
Explanation in comments:
public class StackOverflow55313851 {
public final static String TEXT = "Profoma invoice";
public static void main(String[] args) {
String multilineString = "Payment date\n" +
"receipt serial\n" +
"Profoma invoice VC-38NN\n" +
"Welcome again";
// split text by line breaks
String[] lines = multilineString.split("\n");
// iterate over every line
for (String line : lines) {
// if it contains desired text
if (line.toLowerCase().contains(TEXT.toLowerCase())) {
// find position of desired text in this line
int indexOfInvoiceText = line.toLowerCase().indexOf(TEXT.toLowerCase());
// get only part of the line following the desired text
String invoiceNumber = line.substring(indexOfInvoiceText + TEXT.length() + 1);
System.out.println(invoiceNumber);
}
}
}
}
I am reading a text file which contains movie titles, year, language etc.
I am trying to grab those attributes.
Suppose some string are like this :
String s = "A Fatal Inversion" (1992)"
String d = "(aka "Verhngnisvolles Erbe" (1992)) (Germany)"
String f = "\"#Yaprava\" (2013) "
String g = "(aka \"Love Heritage\" (2002)) (International: English title)"
How can i grab title, year, country if specified, what sort of title if specified from this?
I am not very good at using regex and patterns, but I don't know how to find what sort of attribute it is when they are not specified. I am doing this because I am trying to generate xml from a textfile. I have the dtd for it but im not sure I need it to use it in this case.
Edit: Here is what i have tried.
String pattern;
Pattern p = Pattern.compile("\"([^\"]*)\"");
Matcher m;
Pattern number = Pattern.compile("\\d+");
Matcher num;
m = p.matcher(s);
num = number.matcher(s);
if(m.find()){
System.out.println(m.group(1));
}
if(num.find()){
System.out.println(num.group(0));
}
I suggest you extract the year first as this seems fairly consistent. Then I'd extract the country (if present) and the rest I'll assume is the title.
For extracting the countries I'd recommend you hardcode a regex pattern with the names of known countries. It might take some iterating to determine what these are as they seem to be pretty inconsistent.
This code is a bit ugly (but then so is the data!):
public class Extraction {
public final String original;
public String year = "";
public String title = "";
public String country = "";
private String remaining;
public Extraction(String s) {
this.original = s;
this.remaining = s;
extractBracketedYear();
extractBracketedCountry();
this.title = remaining;
}
private void extractBracketedYear() {
Matcher matcher = Pattern.compile(" ?\\(([0-9]+)\\) ?").matcher(remaining);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
this.year = matcher.group(1);
matcher.appendReplacement(sb, "");
}
matcher.appendTail(sb);
remaining = sb.toString();
}
private void extractBracketedCountry() {
Matcher matcher = Pattern.compile("\\((Germany|International: English.*?)\\)").matcher(remaining);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
this.country = matcher.group(1);
matcher.appendReplacement(sb, "");
}
matcher.appendTail(sb);
remaining = sb.toString();
}
public static void main(String... args) {
for (String s : new String[] {
"A Fatal Inversion (1992)",
"(aka \"Verhngnisvolles Erbe\" (1992)) (Germany)",
"\"#Yaprava\" (2013) ",
"(aka \"Love Heritage\" (2002)) (International: English title)"}) {
Extraction extraction = new Extraction(s);
System.out.println("title = " + extraction.title);
System.out.println("country = " + extraction.country);
System.out.println("year = " + extraction.year);
System.out.println();
}
}
}
Produces:
title = A Fatal Inversion
country =
year = 1992
title = (aka "Verhngnisvolles Erbe")
country = Germany
year = 1992
title = "#Yaprava"
country =
year = 2013
title = (aka "Love Heritage")
country = International: English title
year = 2002
Once you've got this data, you can manipulate it further (e.g. "International: English title" -> "England").
I have a String template like this:
"Thanks, this is your value : [value]. And this is your account number : [accountNumber]"
And i have inputs like this:
input 1 : "Thanks, this is your value : 100. And this is your account number : 219AD098"
input 2 : "Thanks, this is your value : 150. And this is your account number : 90582374"
input 3 : "Thanks, this is your value : 200. And this is your account number : 18A47"
I want output like this:
output 1 : "[value] = 100 | [accountNumber] = 219AD098"
output 2 : "[value] = 150 | [accountNumber] = 90582374"
output 3 : "[value] = 200 | [accountNumber] = 18A47"
How to do that? Maybe using Regex?
note : the template is not fixed.. the only thing that fixed is [value] and [accountNumber]..
use this regex
(?<=value : )(\d+)|(?<=number : )(.+)(?=")
this will extract both the values from the lines that you want. after getting them you can concatenate them with anything you want like your output string.
the code to use this regex will be like this
Pattern pattern = Pattern.compile("(?<=value : )(\d+)|(?<=number : )(.+)(?=\")");
Matcher matcher = pattern.matcher(SOURCE_TEXT_LINE);
List<String> allMatches = new ArrayList<String>();
while (matcher.find()) {
allMatches.add(matcher.group());
}
so this way you will get the matched values in this array list, of you can use a simple array if you prefer.
String text = "Thanks, this is your value : 100. And this is your account number : 219AD098";
Pattern pattern = Pattern
.compile("Thanks, this is your value : (\\d+). And this is your account number : (\\w+)");
Matcher matcher = pattern.matcher(text);
matcher.find();
String outputText = "[value] = " + matcher.group(1)
+ " | [accountNumber] = " + matcher.group(2);
System.out.println(outputText);
is is easy to do without regex too:
String input = getInput();
String[] inputLines = input.split("\n");
String output = "";
int counter = 1;
for(string line : inputLines)
{
int subValStart = line.indexOf("value : ");
string val = line.substring(subValStart, line.indexOf("|") - subValStart);
string accNum = line.substring("account number : ");
output += "output " + counter + " :\"[value] = "+ val + " | [accountNumber] = " + accNum + "\"\n";
counter++;
}
Try this, StringUtils.subStringBefore
String sCurrentLine = CURRENT_LINE;
String[] splitedValue = sCurrentLine.split(":");
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append(splitedValue[0].replace("input", "output"));
stringBuilder.append(": \"[value] = "+StringUtils.substringBefore(splitedValue[2], "."));
stringBuilder.append(" | [accountNumber] = "+splitedValue[3]);
You can use regular expression.
Here is full example
package snippet;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) throws CoffeeDOMException, IOException {
String test = "Thanks, this is your value : 100 . And this is your account number : 219AD098";
String valueExpression = "\\svalue\\s:([^.]+)";
String accExpresion = "\\saccount\\snumber\\s:([^$]+)";
System.out.println("Value:" + runSubRegex(valueExpression, test));
System.out.println("Account:" + runSubRegex(accExpresion, test));
}
private static String runSubRegex(String regex, String tag) {
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher(tag);
if (matcher.find()) {
return matcher.group(1);
}
return null;
}
}
Output
Value: 100
Account : 219AD098
Just check it out.
String template = "Thanks, this is your value : -XXXX-. And this is your account number : -XXXX- -XXXX- Value,Account Number";
String input = "Thanks, this is your value : 100. And this is your account number : 219AD098";
/*String template = "You can use -XXXX- mehod to read values from -XXXX- Value 1,value 2";
String input = "You can use this mehod to read values from custom string template";*/
String[] splitValue = template.split("-XXXX-");
for (String splitValueTemp : splitValue) {
input = input.replace(splitValueTemp, "!");
}
List<String> value = Arrays.asList(input.split("!"));
List<String> Key = Arrays.asList(splitValue[splitValue.length - 1].split(","));
if (value != null && value.size() > 1) {
int iCnt = 0;
for (String opValue : value.subList(1, value.size())) {
if (Key.size() > iCnt) {
System.out.println(Key.get(iCnt).trim() + " : " + opValue.trim());
}
iCnt++;
}
}
O/P:
Value : 100
Account Number : 219AD098
String linkPattern = "\\[[A-Za-z_0-9]+\\]";
String text = "[build]/directory/[something]/[build]/";
RegExp reg = RegExp.compile(linkPattern,"g");
MatchResult matchResult = reg.exec(text);
for (int i = 0; i < matchResult.getGroupCount(); i++) {
System.out.println("group" + i + "=" + matchResult.getGroup(i));
}
I am trying to get all blocks which are encapsulated by squared bracets form a path string:
and I only get group0="[build]" what i want is:
1:"[build]" 2:"[something]" 3:"[build]"
EDIT:
just to be clear words inside the brackets are generated with random text
public static String genText()
{
final int LENGTH = (int)(Math.random()*12)+4;
StringBuffer sb = new StringBuffer();
for (int x = 0; x < LENGTH; x++)
{
sb.append((char)((int)(Math.random() * 26) + 97));
}
String str = sb.toString();
str = str.substring(0,1).toUpperCase() + str.substring(1);
return str;
}
EDIT 2:
JDK works fine, GWT RegExp gives this problem
SOLVED:
Answer from Didier L
String linkPattern = "\\[[A-Za-z_0-9]+\\]";
String result = "";
String text = "[build]/directory/[something]/[build]/";
RegExp reg = RegExp.compile(linkPattern,"g");
MatchResult matchResult = null;
while((matchResult=reg.exec(text)) != null){
if(matchResult.getGroupCount()==1)
System.out.println( matchResult.getGroup(0));
}
I don't know which regex library you are using but using the one from the JDK it would go along the lines of
String linkPattern = "\\[[A-Za-z_0-9]+\\]";
String text = "[build]/directory/[something]/[build]/";
Pattern pat = Pattern.compile(linkPattern);
Matcher mat = pat.matcher(text);
while (mat.find()) {
System.out.println(mat.group());
}
Output:
[build]
[something]
[build]
Try:
String linkPattern = "(\\[[A-Za-z_0-9]+\\])*";
EDIT:
Second try:
String linkPattern = "\\[(\\w+)\\]+"
Third try, see http://rubular.com/r/eyAQ3Vg68N