Java Matcher: How to match multiple lines with one regex

Java Matcher: How to match multiple lines with one regex - java

My method takes a file, and tries to extract the text between the header ###Title### and closing ###---###. I need it to extract multiple lines and put each line into an array. But since readAllLines() converts all lines into an array, I don't know how to compare and match it.
public static ArrayList<String> getData(File f, String title) throws IOException {
ArrayList<String> input = (ArrayList<String>) Files.readAllLines(f.toPath(), StandardCharsets.US_ASCII);
ArrayList<String> output = new ArrayList<String>();
//String? readLines = somehow make it possible to match
System.out.println("Checking entry.");
Pattern p = Pattern.compile("###" + title + "###(.*)###---###", Pattern.DOTALL);
Matcher m = p.matcher(readLines);
if (m.matches()) {
m.matches();
String matched = m.group(1);
System.out.println("Contents: " + matched);
String[] array = matched.split("\n");
ArrayList<String> array2 = new ArrayList<String>();
for (String j:array) {
array2.add(j);
}
output = array2;
} else {
System.out.println("No matches.");
}
return output;
}
Here is my file, and I'm 100% sure that the compiler is reading the correct one.
###Test File###
Entry 1
Entry 2
Data 1
Data 2
Test 1
Test 2
###---###
The output says "No matches." instead of the entries.

You don't need regex for that. It's enough to loop through the array and compare items line by line, taking those between the start and end tags.
ArrayList<String> input = (ArrayList<String>) Files.readAllLines(f.toPath(), StandardCharsets.US_ASCII);
ArrayList<String> output = new ArrayList<String>();
boolean matched = false;
for (String line : input) {
if (line.equals("###---###") && matched) matched = false; //needed parentheses
if (matched) output.add(line);
if (line.equals("###Test File###") && !matched) matched = true;
}

As per your comment, if they are going to be in the same way as posted, then i don't think regex is needed for this requirement. You can read line by line and do a contains of '###'
public static void main(String args[])
{
ArrayList<String> dataList = new ArrayList<String>();
try{
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("textfile.txt");
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// this line will skip the header and footer with '###'
if(!strLine.contains("###");
dataList.add(strLine);
}
//Close the input stream
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
//Now dataList has all the data between ###Test File### and ###---###
}
You can also change the contains method parameter according to your requirement to ignore lines!

Related

How to count duplicate entries in a .csv file?

I have a .csv file that is formated like this:
ID,date,itemName
456,1-4-2020,Lemon
345,1-3-2020,Bacon
345,1-4-2020,Sausage
123,1-1-2020,Apple
123,1-2-2020,Pineapple
234,1-2-2020,Beer
345,1-4-2020,Cheese
I have already implemented the algorithm to go through the file, scan for the first number and sort it in a descending order and make a new output:
123,1-1-2020,Apple
123,1-2-2020,Pineapple
234,1-2-2020,Beer
345,1-3-2020,Bacon
345,1-4-2020,Cheese
345,1-4-2020,Sausage
456,1-4-2020,Lemon
My question is, how do I implement my algorithm to make an output that counts the duplicate first number entries and reformat it to make it look like this...
123,1-1-2020,1,Apple
123,1-2-2020,1,Pineapple
234,1-2-2020,1,Beer
345,1-3-2020,1,Bacon
345,1-4-2020,2,Cheese,Sausage
456,1-4-2020,1,Lemon
...so that it counts the number of occurrence for each ID, denote it with the number of times, and if the date of that ID is also the same, combine the item names to the same line. Below is my source code (each line in the .csv is made into an object named 'receipt' that has ID, date, and name with their respective get() methods):
public class ReadFile {
private static List<Receipt> readFile() {
List<Receipt> receipts = new ArrayList<>();
try {
BufferedReader reader = new BufferedReader(new FileReader("dataset.csv"));
// Move past the first title line
reader.readLine();
String line = reader.readLine();
// Start reading from second line till EOF, split each string at ","
while (line != null) {
String[] attributes = line.split(",");
Receipt attribute = getAttributes(attributes);
receipts.add(attribute);
line = reader.readLine();
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
return receipts;
}
private static Receipt getAttributes(String[] attributes) {
// Get ID located before the first ","
long memberNumber = Long.parseLong(attributes[0]);
// Get date located after the first ","
String date = attributes[1];
// Get name located after the second ","
String name = attributes[2];
return new Receipt(memberNumber, date, name);
}
// Parse the data into new file after sorting
private static void parse(List<Receipt> receipts) {
PrintWriter output = null;
try {
output = new PrintWriter("output.txt");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
// For each receipts, assert the text output stream is not null, print line.
for (Receipt p : receipts) {
assert output != null;
output.println(p.getMemberNumber() + "," + p.getDate() + "," + p.getName());
}
assert output != null;
output.close();
}
// Main method, accept input file, sort and parse
public static void main(String[] args) {
List<Receipt> receipts = readFile();
QuickSort q = new QuickSort();
q.quickSort(receipts);
parse(receipts);
}
}

The easiest way is to use a map.
Sample data from your file.
String[] lines = {
"123,1-1-2020,Apple",
"123,1-2-2020,Pineapple",
"234,1-2-2020,Beer",
"345,1-3-2020,Bacon",
"345,1-4-2020,Cheese",
"345,1-4-2020,Sausage",
"456,1-4-2020,Lemon"};
Create a map
as you read the lines, split them and add them to the map using the compute method. This will put the line in if the key (number and date) doesn't exist. Otherwise it simply appends the last item to the existing entry.
the file does not have to be sorted but the values will be added to the end as they are encountered.
Map<String, String> map = new LinkedHashMap<>();
for (String line : lines) {
String[] vals = line.split(",");
// if v is null, add the line
// if v exists, take the existing line and append the last value
map.compute(vals[0]+vals[1], (k,v)->v == null ? line : v +","+vals[2]);
}
for (String line : map.values()) {
String[] fields = line.split(",",3);
int count = fields[2].split(",").length;
System.out.printf("%s,%s,%s,%s%n", fields[0],fields[1],count,fields[2]);
}
For this sample run prints
123,1-1-2020,1,Apple
123,1-2-2020,1,Pineapple
234,1-2-2020,1,Beer
345,1-3-2020,1,Bacon
345,1-4-2020,2,Cheese,Sausage
456,1-4-2020,1,Lemon

How to remove the duplicate string?

In my code I have two files in my drive those two files have some text and I want to display those string in the console and also remove the repeated string and display the repeated string once rather than displaying it twice.
Code:
public class read {
public static void main(String[] args) {
try{
File file = new File("D:\\file1.txt");
FileReader fileReader = new FileReader(file);
BufferedReader br = new BufferedReader(fileReader);
StringBuffer stringBuffer = new StringBuffer();
String line;
while((line = br.readLine()) != null){
stringBuffer.append(line);
stringBuffer.append("\n");
}
fileReader.close();
System.out.println("Contents of file1:");
String first = stringBuffer.toString();
System.out.println(first);
File file1 = new File("D:\\file2.txt");
FileReader fileReader1 = new FileReader(file1);
BufferedReader br1 = new BufferedReader(fileReader1);
StringBuffer stringBuffer1 = new StringBuffer();
String line1;
while((line1 = br1.readLine()) != null){
stringBuffer1.append(line1);
stringBuffer1.append("\n");
}
fileReader1.close();
System.out.println("Contents of file2:");
String second = stringBuffer1.toString();
System.out.println(second);
System.out.println("answer:");
System.out.println(first+second);
}catch (IOException e) {
// TODO: handle exception
e.printStackTrace();
}
}
}
Output is:
answer:
hi hello
how are you
hi ya
i am fine
But I want to compare both the strings and if the same string repeated then that string should be displayed once.
Output I expect is like this:
answer:
hi hello
how are you
ya
i am fine
Where the "hi" is found in both the strings so that I need to delete the one duplicate string.
How can I do that please help.
Thanks in advance.

You can pass your lines through this method to parse out duplicate words:
// store unique previous words
static Set<String> words = new HashSet<>();
static String removeDuplicateWords(String line) {
StringJoiner sj = new StringJoiner(" ");
// split on whitespace to get distinct words
for (String word : line.split("\\s+")) {
// try to add word to the set
if (words.add(word)) {
// if the word was added (=not seen before), append to the result
sj.add(word);
}
}
return sj.toString();
}

Trying to read in multiple lines of a file into string array depending on what the line starts with then enter into a map

Why do i only get one entry into the map when i run this code.There is thousands of lines in the file im reading in but it only seems to be getting to the first line and stopping?
public class Details {
public Map<String, String> dictionaryWords() throws IOException{
String cvsSplitBy = ",";
Collection<String> words = new TreeSet<String>();
Map<String,String> m = new TreeMap<String,String>();
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("dictionary.csv")));
String line = null;
String [] word = null;
String remove = null;
String nextline = null;
String getAllLines = "-";
while ((line = br.readLine())!= null) {
if (line.startsWith("\"")) {
getAllLines = line;
while((nextline = br.readLine())!= null){
if(!nextline.startsWith("\"")){
getAllLines.concat(nextline);
}else{
}
words.add(getAllLines);
word = getAllLines.split(cvsSplitBy);
remove = word[0].replace('"', '-');
m.put(remove.toLowerCase(),Arrays.toString(word));
}
}else{
}
}
for (String key : m.keySet()) {
System.out.println(key + " " + m.get(key));}
return m;
}

Try the following code
if(!nextline.startsWith("\""))
{
getAllLines = getAllLines.concat(nextline);
}
Don't forget to reassign "getAllLines" to the return value of the .concat() function. Since Strings are immutable, the .concat() function returns a new String object, which you do not assign to anything (therefore it is lost). This leaves you with your original String still stored in "getAllLines" as if the call to .concat() was never made.
Feel Free to use the StringBuilder class and the append method, which will likely be much faster than creating new Strings via .concat() thousands of times.
Also: You do not need blank else{} statements.

In the following part of your code the nextlines (2nd ...) are lost in space. They are saved in the variable nextline and used as a parameter for getAllLines.concat. But the return value of String::concat is not assigned to anything.
...
while((nextline = br.readLine())!= null){
if(!nextline.startsWith("\"")){
getAllLines.concat(nextline);
}else{
...

Read the each string text from file in java

I am new in java. I just wants to read each string in java and print it on console.
Code:
public static void main(String[] args) throws Exception {
File file = new File("/Users/OntologyFile.txt");
try {
FileInputStream fstream = new FileInputStream(file);
BufferedReader infile = new BufferedReader(new InputStreamReader(
fstream));
String data = new String();
while ((data = infile.readLine()) != null) { // use if for reading just 1 line
System.out.println(""+data);
}
} catch (IOException e) {
// Error
}
}
If file contains:
Add label abc to xyz
Add instance cdd to pqr
I want to read each word from file and print it to a new line, e.g.
Add
label
abc
...
And afterwards, I want to extract the index of a specific string, for instance get the index of abc.
Can anyone please help me?

It sounds like you want to be able to do two things:
Print all words inside the file
Search the index of a specific word
In that case, I would suggest scanning all lines, splitting by any whitespace character (space, tab, etc.) and storing in a collection so you can later on search for it. Not the question is - can you have repeats and in that case which index would you like to print? The first? The last? All of them?
Assuming words are unique, you can simply do:
public static void main(String[] args) throws Exception {
File file = new File("/Users/OntologyFile.txt");
ArrayList<String> words = new ArrayList<String>();
try {
FileInputStream fstream = new FileInputStream(file);
BufferedReader infile = new BufferedReader(new InputStreamReader(
fstream));
String data = null;
while ((data = infile.readLine()) != null) {
for (String word : data.split("\\s+") {
words.add(word);
System.out.println(word);
}
}
} catch (IOException e) {
// Error
}
// search for the index of abc:
for (int i = 0; i < words.size(); i++) {
if (words.get(i).equals("abc")) {
System.out.println("abc index is " + i);
break;
}
}
}
If you don't break, it'll print every index of abc (if words are not unique). You could of course optimize it more if the set of words is very large, but for a small amount of data, this should suffice.
Of course, if you know in advance which words' indices you'd like to print, you could forego the extra data structure (the ArrayList) and simply print that as you scan the file, unless you want the printings (of words and specific indices) to be separate in output.

Split the String received for any whitespace with the regex \\s+ and print out the resultant data with a for loop.
public static void main(String[] args) { // Don't make main throw an exception
File file = new File("/Users/OntologyFile.txt");
try {
FileInputStream fstream = new FileInputStream(file);
BufferedReader infile = new BufferedReader(new InputStreamReader(fstream));
String data;
while ((data = infile.readLine()) != null) {
String[] words = data.split("\\s+"); // Split on whitespace
for (String word : words) { // Iterate through info
System.out.println(word); // Print it
}
}
} catch (IOException e) {
// Probably best to actually have this on there
System.err.println("Error found.");
e.printStackTrace();
}
}

Just add a for-each loop before printing the output :-
while ((data = infile.readLine()) != null) { // use if for reading just 1 line
for(String temp : data.split(" "))
System.out.println(temp); // no need to concatenate the empty string.
}
This will automatically print the individual strings, obtained from each String line read from the file, in a new line.
And afterwards, I want to extract the index of a specific string, for
instance get the index of abc.
I don't know what index are you actually talking about. But, if you want to take the index from the individual lines being read, then add a temporary variable with count initialised to 0.
Increment it till d equals abc here. Like,
int count = 0;
for(String temp : data.split(" ")){
count++;
if("abc".equals(temp))
System.out.println("Index of abc is : "+count);
System.out.println(temp);
}

Use Split() Function available in Class String.. You may manipulate according to your need.
or
use length keyword to iterate throughout the complete line
and if any non- alphabet character get the substring()and write it to the new line.

List<String> words = new ArrayList<String>();
while ((data = infile.readLine()) != null) {
for(String d : data.split(" ")) {
System.out.println(""+d);
}
words.addAll(Arrays.asList(data));
}
//words List will hold all the words. Do words.indexOf("abc") to get index
if(words.indexOf("abc") < 0) {
System.out.println("word not present");
} else {
System.out.println("word present at index " + words.indexOf("abc"))
}

How to trim the elements before assigning it into an array list?

I need to assign the elements present in a CSV file into an arraylist. CSV file contains filenames with extension .tar. I need to trim those elements before i read it into an array list or trim the whole arraylist. Please help me with it
try
{
String strFile1 = "D:\\Ramakanth\\PT2573\\target.csv"; //csv file containing data
BufferedReader br1 = new BufferedReader( new FileReader(strFile1)); //create BufferedReader
String strLine1 = "";
StringTokenizer st1 = null;
while( (strLine1 = br1.readLine()) != null) //read comma separated file line by line
{
st1 = new StringTokenizer(strLine1, ","); //break comma separated line using ","
while(st1.hasMoreTokens())
{
array1.add(st1.nextToken()); //store csv values in array
}
}
}
catch(Exception e)
{
System.out.println("Exception while reading csv file: " + e);
}

If you want to remove the ".tar" string from your tokens, you can use:
String nextToken = st1.nextToken();
if (nextToken.endsWith(".tar")) {
nextToken = nextToken.replace(".tar", "");
}
array1.add(nextToken);

You shouldn't be using StringTokenizer the JavaDoc says (in part) StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead. You should close your BufferedReader. You could use a try-with-resources statement to do that. And, you might use a for-each loop to iterate the array produced by String.split(String) the regular expression below optionally matches whitespace before or after your , and you might continue the loop if the token endsWith ".tar" like
String strFile1 = "D:\\Ramakanth\\PT2573\\target.csv";
try (BufferedReader br1 = new BufferedReader(new FileReader(strFile1)))
{
String strLine1 = "";
while( (strLine1 = br1.readLine()) != null) {
String[] parts = strLine1.split("\\s*,\\s*");
for (String token : parts) {
if (token.endsWith(".tar")) continue; // <-- don't add "tar" files.
array1.add(token);
}
}
}
catch(Exception e)
{
System.out.println("Exception while reading csv file: " + e);
}

if(str.indexOf(".tar") >0)
str = str.subString(0, str.indexOf(".tar")-1);

while(st1.hasMoreTokens())
{
String input = st1.nextToken();
int index = input.indexOf("."); // Get the position of '.'
if(index >= 0){ // To avoid StringIndexOutOfBoundsException, when there is no match with '.' then the index position set to -1.
array1.add(input.substring(0, index)); // Get the String before '.' position.
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Matcher: How to match multiple lines with one regex - java

Related

How to count duplicate entries in a .csv file?

How to remove the duplicate string?

Trying to read in multiple lines of a file into string array depending on what the line starts with then enter into a map

Read the each string text from file in java

How to trim the elements before assigning it into an array list?

Categories

Resources