Regex extract string, why my pattern don't works? - java

I have a long string in this format (a long single line in file):
"1":"Aname","2":"AnotherName","3":"Sempronio"
I want to extract the number and the name and save them on a Map.
I tried this:
FileReader fileReader = null;
BufferedReader br = null;
File file = new File("./SingleLineFileNames.txt");
try {
fileReader = new FileReader(file);
br = new BufferedReader(fileReader);
String line;
Pattern p = Pattern.compile("\"(\\d+)\":\"([\\w-.' ]+)\"");
Matcher matcher;
while((line = br.readLine()) != null) {
matcher = p.matcher(line);
String name;
int i = 1;
while((name = matcher.group(i)) != null){
// save in map
i++;
}
}
}
catch (FileNotFoundException e) {
e.printStackTrace();
}
catch (IOException e) {
e.printStackTrace();
}
finally {
try {
br.close();
fileReader.close();
}
catch (IOException e) {
e.printStackTrace();
}
}
return null;
result is java.lang.IllegalStateException: No match found
It's the right way to iterate on groups?
Where I wrong?

First split the String at , (String#split) and then split each resulting array element at : to get key and value. With input strings like these, I wonder what kind of masochism is on the developers using regex sledgehammers breaking these simple nuts..

If you use hyphen inside [] then always place at the first or at the last.
Pattern p = Pattern.compile("\"(\\d+)\":\"([-\\w.' ]+)\"");
^ here
Also the way you are checking the group() is not correct. Check here:
while(matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}

Remove the broken square bracket construct ([\\w-.' ]+) . For the name containing word characters only, it is enough to put (\\w+) there.

Related

How to split the input string into corresponding utf- charcters?

I have to split the strings of input text file (which is in hindi language) in java language.
Is there is a way to do so ?I have tried to split it into single characters but that doesn't word. For example:
मुझे बहुत सारा काम करना है|
then output should be
मु
झे
ब
हु
त
सा
रा
का
म
क
र
ना
है
This will solve your problem
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader("your text file path goes here"));
String read = null;
while ((read = in.readLine()) != null) {
String[] splited = read.split("\\s+");
for (String part : splited) {
System.out.println(part);
}
}
} catch (IOException e) {
System.out.println("There was a problem: " + e);
e.printStackTrace();
} finally {
try {
in.close();
} catch (Exception e) {
}
}
}
Note:- provide your full file path to the filereader.
All string data types processed in Java are 'Unicode' so you might get unexpected result.
You may refer to this question. I think, it seems similar problem
Try this out
String s = new String("मुझे बहुत सारा काम करना है");
for(int i =0 ;i<s.length();i++){
System.out.println(s.charAt(i));
}

Java - Groovy : regex parse text block

I know that this is a common question and I've been through a lot of forums to figure out whats the problem in my code.
I have to read a text file with several blocks in the following format:
import com.myCompanyExample.gui.Layout
/*some comments here*/
#Layout
LayoutModel currentState() {
MyBuilder builder = new MyBuilder()
form example
title form{
row_1
row_1
row_n
}
return build.get()
}
#Layout
LayoutModel otherState() {
....
....
return build.get()
}
I have this code to read all the file and I'd like to extract each block between the keyword "#Layout" and the keyword "return". I need also to catch all newline so later I'll be able to split each matched block into a list
private void myReadFile(File fileLayout){
String line = null;
StringBuilder allText = new StringBuilder();
try{
FileReader fileReader = new FileReader(fileLayout);
BufferedReader bufferedReader = new BufferedReader(fileReader);
while((line = bufferedReader.readLine()) != null) {
allText.append(line)
}
bufferedReader.close();
}
catch(FileNotFoundException ex) {
System.out.println("Unable to open file");
}
catch(IOException ex) {
System.out.println("Error reading file");
}
Pattern pattern = Pattern.compile("(?s)#Layout.*?return",Pattern.DOTALL);
Matcher matcher = pattern.matcher(allText);
while(matcher.find()){
String [] layoutBlock = (matcher.group()).split("\\r?\\n")
for(index in layoutBlock){
//check each line of the current block
}
}
layoutBlock returns size=1
I think this can potentially be a so called XY problem anyway...if the groovy source is composed only by #Layout annotated blocks of code you can use a tempered greedy token to select till the next annotation (view online demo).
Change the pattern loc as this:
Pattern pattern = Pattern.compile( "#Layout(?:(?!#Layout).)*", Pattern.DOTALL );
PS: the dotall flag (?s) inside the regex and the parameter Pattern.DOTALL do the same thing (enable the so called multiline mode), use only one of them indifferently.
UPDATE
I tried your code, the problem (preserving newline) is in the method you use to slurp the file (bufferedReader.readline() remove the newline at the end of the string).
Simply readd a newline when append to allText:
String ln = System.lineSeparator();
while((line = bufferedReader.readLine()) != null) {
allText.append(line + ln);
}
Or you can replace all the code to slurp the file with this:
import java.nio.file.Files;
import java.nio.file.Paths;
//can throw an IOException
String filePath = "/path/to/layout.groovy";
String allText = new String(Files.readAllBytes(Paths.get(filePath)),StandardCharsets.UTF_8);

Assigning part of txt file to java variable

I have a txt file with the following output:
"CN=COUD111255,OU=Workstations,OU=Mis,OU=Accounts,DC=FLHOSP,DC=NET"
What I'm trying to do is read the COUD111255 part and assign it to a java variable. I assigned ldap to sCurrentLine, but I'm getting a null point exception. Any suggestions.
try (BufferedReader br = new BufferedReader(new FileReader("resultofbatch.txt")))
{
final Pattern PATTERN = Pattern.compile("CN=([^,]+).*");
try {
while ((sCurrentLine = br.readLine()) != null) {
//Write the function you want to do, here.
String[] tokens = PATTERN.split(","); //This will return you a array, containing the string array splitted by what you write inside it.
//should be in your case the split, since they are seperated by ","
}
System.out.println(sCurrentLine);
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
} catch (IOException e2) {
// TODO Auto-generated catch block
e2.printStackTrace();
}
}
});
You just need to read data from a file line by line and assign the line to your variable str. Refer to following link:
How to read a large text file line by line using Java?
Your code is almost correct. You are writing this string to standard output - what for? If I understand you right, what you need is simply this:
private static final Pattern PATTERN = Pattern.compile("CN=([^,]+).*");
public static String solve(String str) {
Matcher matcher = PATTERN.matcher(str);
if (matcher.matches()) {
return matcher.group(1);
} else {
throw new IllegalArgumentException("Wrong string " + str);
}
}
This call
solve("CN=COUD111255,OU=Workstations,OU=Mis,OU=Accounts,DC=FLHOSP,DC=NET")
gave me "COUD111255" as answer.
To read from .txt, use BufferedReader. To create a one, write:
BufferedReader br = new BufferedReader(new FileReader("testing.txt"));
testing.txt is the name of the txt that you're reading and must be in your java file. After initializing, you must continue as:
while ((CurrentLine = br.readLine()) != null) {
//Write the function you want to do, here.
String[] tokens = CurrentLine.split(","); //This will return you a array, containing the string array splitted by what you write inside it.
//should be in your case the split, since they are seperated by ","
}
You got tokens array which is = [CN=COUD111255,OU=Workstations OU=Mis,OU=Accounts,DC=FLHOSP,DC=NET].
So, now take the 0th element of array and make use of it. You got the CN=COUD111255, now! Leaving here not to give whole code.
Hope that helps !

JAVA: Getting the content of specific strings from text files

I have a text file like this:
text
text
text
.
.
#data
instances1
instances2
.
.
instancesN
I want to get the contents of this file from #data until the end of the file, how can I do?
I found this method of FileUtils (from apache commons-lang) class but it's usable only if I already know the line number.
String ln = FileUtils.readLines(new File("arff_file/"+results.get(0)))
.get(lineNumber);
Since you are using Apache Commons, you can do it in one line:
String contents = FileUtils.readFileToString(new File("arff_file/"+results.get(0)), "UTF-16").replaceAll("^.*?(?=#data)", "");
This works by
reading the whole file into a single String
using regex-based replaceAll() to remove (by replacing with a blank) everything up to, but not including, #data
The regex breakdown of ^.*?(?=#data) is:
^ start of input
.*? a reluctantly quantified wildcard
(?=#data) a positive (non-consuming) look ahead that asserts that the next input is #data
A reluctant quantifier could be important to use so it won't skip past the first #data, in case it appears more than once in the input.
try {
String file = "fileName";
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null) {
if (line.equals("#data"))
nowRead(br);//I just do this for more efficiency, you can set a boolean flag instead
}
br.close();
}catch (IOException e) {
//OMG Exception again!
}
}
static ArrayList<String> nowRead(BufferedReader br) throws IOException {
ArrayList<String> s = new ArrayList<String>();// do it as you wish
String line;
while ((line = br.readLine()) != null) {
s.add(line);
}
return s;
}
Path start = Paths.get("test.txt");
try
{
List<String> lines = Files.readAllLines(start);
for (Iterator<String> it = lines.iterator(); it.hasNext();)
{
String line = it.next();
if (!"#data".equals(line.trim()))
{
it.remove();
}
else
{
break;
}
}
System.out.println(lines);
}
catch (IOException e)
{
e.printStackTrace();
}
I was reading about Path online so why not something like this as alternative to Bohemian code?
Maybe something could be done using stream() of Java 8 but not still nothing...

Problems trying to retrieve information from txt file

I'm stuck on one issue in my application. I have one text file that contains one piece of code that I need to retrieve to apply into one string variable. The problem is which is the best way to do this? I ran those samples below, but they are logically incorrect / incomplete. Take a look:
Reading through line:
BufferedReader bfr = new BufferedReader(new FileReader(Node));
String line = null;
try {
while( (line = bfr.readLine()) != null ){
line.contentEquals("d.href");
System.out.println(line);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Reading through character:
BufferedReader bfr = new BufferedReader(new FileReader(Node));
int i = 0;
try {
while ((i = bfr.read()) != -1) {
char ch = (char) i;
System.out.println(Character.toString(ch));
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
};
Reading through Scanner:
BufferedReader bfr = new BufferedReader(new FileReader(Node));
BufferedReader bfr = new BufferedReader(new FileReader(Node));
int wordCount = 0, totalcount = 0;
Scanner s = new Scanner(googleNode);
while (s.hasNext()) {
totalcount++;
if (s.next().contains("(?=d.href).*?(}=?)")) wordCount++;
}
System.out.println(wordCount+" "+totalcount);
With (1.) I'm having difficult to find d.href with contains the start of the code piece.
With (2.) I can't think or find one way to store d.href as string and retrieve the rest of information.
With (3.) I can correctly find d.href but I can't retrieve pieces of the txt.
Could anyone help me please?
As answer of my question, I used scanner to read word by word in the text file. .contains("window.maybeRedirectForGBV") returns one boolean value, and hasNext() one string. Then, I stoped the query for my code stretch on the text file one word before I wanted and moved forward one more time to store the value of the next word on one string variable. From this point you only need to treat your string the way you want. Hope this help.
String stringSplit = null;
Scanner s = new Scanner(Node);
while (s.hasNext()) {
if (s.next().contains("window.maybeRedirectForGBV")){
stringSplit = s.next();
break;
}
}
You can use regular expressions like this:
Pattern pattern = Pattern.compile("^\\s*d\\.href([^=]*)=(.*)$");
// Groups: 1-----1 2--2
// Possibly spaces, "d.href", any characters not '=', the '=', any chars.
....
Matcher m = pattern.matcher(line);
if (m.matches()) {
String dHrefSuffix = m.group(1);
String value = m.group(2);
System.out.println(value);
break;
}
BufferedReader will do.

Categories

Resources