Effective way to read file and parse each line - java
I have a text file of next format: each line starts with a string which is followed by sequence of numbers. Each line has unknown length (unknown amount of numbers, amount from 0 to 1000).
string_1 3 90 12 0 3
string_2 49 0 12 94 13 8 38 1 95 3
.......
string_n 9 43
Afterwards I must handle each line with handleLine method which accept two arguments: string name and numbers set (see code below).
How to read the file and handle each line with handleLine efficiently?
My workaround:
Read file line by line with java8 streams Files.lines. Is it blocking?
Split each line with regexp
Convert each line into header string and set of numbers
I think it's pretty uneffective due 2nd and 3rd steps. 1st step mean that java convert file bytes to string first and then in 2nd and 3rd steps I convert them back to String/Set<Integer>. Does that influence performance a lot? If yes - how to do better?
public handleFile(String filePath) {
try (Stream<String> stream = Files.lines(Paths.get(filePath))) {
stream.forEach(this::indexLine);
} catch (IOException e) {
e.printStackTrace();
}
}
private void handleLine(String line) {
List<String> resultList = this.parse(line);
String string_i = resultList.remove(0);
Set<Integer> numbers = resultList.stream().map(Integer::valueOf).collect(Collectors.toSet());
handleLine(string_i, numbers); // Here is te final computation which must to be done only with string_i & numbers arguments
}
private List<String> parse(String str) {
List<String> output = new LinkedList<String>();
Matcher match = Pattern.compile("[0-9]+|[a-z]+|[A-Z]+").matcher(str);
while (match.find()) {
output.add(match.group());
}
return output;
}
Regarding your first question, it depends on how you reference the Stream. Streams are inherently lazy, and don't do work if you're not going to use it. For example, the call to Files.lines doesn't actually read the file until you add a terminal operation on the Stream.
From the java doc:
Read all lines from a file as a Stream. Unlike readAllLines, this method does not read all lines into a List, but instead populates lazily as the stream is consumed
The forEach(Consumer<T>) call is a terminal operation, and, at that point, the lines of the file are read one by one and passed to your indexLine method.
Regarding your other comments, you don't really have a question here. What are you trying to measure/minmize? Just because something is multiple steps doesn't inherently make it have poor performance. Even if you created a wizbang oneliner to convert from the File bytes directly to your String & Set, you probably just did the intermediate mapping anonymously, or you've called something that will cause the compiler to do that anyway.
Here is your code to parse line into name and numbers
stream.forEach(line -> {
String[] split = line.split("\\b"); //split with blank seperator
Set<String> numbers = IntStream.range(1, split.length)
.mapToObj(index -> split[index])
.filter(str -> str.matches("\\d+")) //filter numbers
.collect(Collectors.toSet());
handleLine(split[0], numbers);
});
Or another way
Map<Boolean, List<String>> collect = Pattern.compile("\\b")
.splitAsStream(line)
.filter(str -> !str.matches("\\b"))
.collect(Collectors.groupingBy(str -> str.matches("\\d+")));
handleLine(collect.get(Boolean.FALSE).get(0), collect.get(Boolean.TRUE));
I set out to test several ways to go about this problem and measure the performance as best I could under noted conditions. Here's what I tested and how I tested it, along with the accompanying results:
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
import java.util.Random;
import java.util.Scanner;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
import java.util.stream.Stream;
public class App {
public static void method1(String testFile) {
List<Integer> nums = null;
try (Scanner s = new Scanner(Paths.get(testFile))) {
while (s.hasNext()) {
if (s.hasNextInt())
nums.add(s.nextInt());
else {
nums = new ArrayList<Integer>();
String pre = s.next();
// handleLine( s.next() ... nums ... );
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
public static void method2(String testFile) {
List<Integer> nums = null;
try (BufferedReader in = new BufferedReader(new FileReader(testFile));
Scanner s = new Scanner(in)) {
while (s.hasNext()) {
if (s.hasNextInt())
nums.add(s.nextInt());
else {
nums = new ArrayList<Integer>();
String pre = s.next();
// handleLine( s.next() ... nums ... );
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
public static void method3(String testFile) {
List<Integer> nums = null;
try (BufferedReader br = new BufferedReader(new FileReader(testFile))) {
String line = null;
while ((line = br.readLine()) != null) {
String[] arr = line.split(" ");
nums = new ArrayList<Integer>();
for (int i = 1; i < arr.length; ++i)
nums.add(Integer.valueOf(arr[i]));
// handleLine( ... );
}
} catch (IOException e) {
e.printStackTrace();
}
}
public static void method3_1(String testFile) {
List<Integer> nums = null;
try (BufferedReader br = new BufferedReader(new FileReader(testFile))) {
String line = null;
while ((line = br.readLine()) != null) {
String[] arr = line.split(" ");
nums = new ArrayList<Integer>();
for (int i = 1; i < arr.length; ++i)
nums.add(Integer.parseInt(arr[i]));
// handleLine( ... );
}
} catch (IOException e) {
e.printStackTrace();
}
}
public static void method4(String testFile) {
List<Integer> nums = null;
try {
List<String> lines = Files.readAllLines(Paths.get(testFile));
for (String s : lines) {
String[] arr = s.split(" ");
nums = new ArrayList<Integer>();
for (int i = 1; i < arr.length; ++i)
nums.add(Integer.valueOf(arr[i]));
// handleLine( ... );
}
} catch (IOException e) {
e.printStackTrace();
}
}
public static void method4_1(String testFile) {
List<Integer> nums = null;
try {
List<String> lines = Files.readAllLines(Paths.get(testFile));
for (String s : lines) {
String[] arr = s.split(" ");
nums = new ArrayList<Integer>();
for (int i = 1; i < arr.length; ++i)
nums.add(Integer.parseInt(arr[i]));
// handleLine( ... );
}
} catch (IOException e) {
e.printStackTrace();
}
}
public static void method5(String testFile) {
List<Integer> nums = null;
try (BufferedReader br = Files.newBufferedReader(Paths.get(testFile))) {
List<String> lines = br.lines().collect(Collectors.toList());
for (String s : lines) {
String[] arr = s.split(" ");
nums = new ArrayList<Integer>();
for (int i = 1; i < arr.length; ++i)
nums.add(Integer.valueOf(arr[i]));
// handleLine( ... );
}
} catch (IOException e) {
e.printStackTrace();
}
}
public static void method5_1(String testFile) {
List<Integer> nums = null;
try (BufferedReader br = Files.newBufferedReader(Paths.get(testFile))) {
List<String> lines = br.lines().collect(Collectors.toList());
for (String s : lines) {
String[] arr = s.split(" ");
nums = new ArrayList<Integer>();
for (int i = 1; i < arr.length; ++i)
nums.add(Integer.parseInt(arr[i]));
// handleLine( ... );
}
} catch (IOException e) {
e.printStackTrace();
}
}
public static void method6(String testFile) {
List<Integer> nums = new LinkedList<Integer>();
try (Stream<String> stream = Files.lines(Paths.get(testFile))) {
stream.forEach(line -> {
String[] split = line.split("\\b"); // split with blank seperator
Set<String> numbers = IntStream.range(1, split.length)
.mapToObj(index -> split[index])
.filter(str -> str.matches("\\d+")) // filter numbers
.collect(Collectors.toSet());
numbers.forEach((k) -> nums.add(Integer.parseInt(k)));
// handleLine( ... );
});
} catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) throws Exception {
args = new String[] { "C:\\Users\\Nick\\Desktop\\test.txt" };
Random r = new Random();
System.out.println("warming up a little...");
for (int i = 0; i < 100000; ++i) {
int x = r.nextInt();
}
long s1 = System.currentTimeMillis();
for (int i = 0; i < 10000; ++i)
method1(args[0]);
long e1 = System.currentTimeMillis();
long s2 = System.currentTimeMillis();
for (int i = 0; i < 10000; ++i)
method2(args[0]);
long e2 = System.currentTimeMillis();
long s3 = System.currentTimeMillis();
for (int i = 0; i < 10000; ++i)
method3(args[0]);
long e3 = System.currentTimeMillis();
long s3_1 = System.currentTimeMillis();
for (int i = 0; i < 10000; ++i)
method3_1(args[0]);
long e3_1 = System.currentTimeMillis();
long s4 = System.currentTimeMillis();
for (int i = 0; i < 10000; ++i)
method4(args[0]);
long e4 = System.currentTimeMillis();
long s4_1 = System.currentTimeMillis();
for (int i = 0; i < 10000; ++i)
method4_1(args[0]);
long e4_1 = System.currentTimeMillis();
long s5 = System.currentTimeMillis();
for (int i = 0; i < 10000; ++i)
method5(args[0]);
long e5 = System.currentTimeMillis();
long s5_1 = System.currentTimeMillis();
for (int i = 0; i < 10000; ++i)
method5_1(args[0]);
long e5_1 = System.currentTimeMillis();
long s6 = System.currentTimeMillis();
for (int i = 0; i < 10000; ++i)
method6(args[0]);
long e6 = System.currentTimeMillis();
System.out.println("method 1 = " + (e1 - s1) + " ms");
System.out.println("method 2 = " + (e2 - s2) + " ms");
System.out.println("method 3 = " + (e3 - s3) + " ms");
System.out.println("method 3_1 = " + (e3_1 - s3_1) + " ms");
System.out.println("method 4 = " + (e4 - s4) + " ms");
System.out.println("method 4_1 = " + (e4_1 - s4_1) + " ms");
System.out.println("method 5 = " + (e5 - s5) + " ms");
System.out.println("method 5_1 = " + (e5_1 - s5_1) + " ms");
System.out.println("method 6 = " + (e6 - s6) + " ms");
}
}
Used with java.version = 1.8.0_101 (Oracle)
x64 OS/processor
Result output:
warming up a little...
method 1 = 1103 ms
method 2 = 872 ms
method 3 = 440 ms
method 3_1 = 418 ms
method 4 = 413 ms
method 4_1 = 376 ms
method 5 = 439 ms
method 5_1 = 384 ms
method 6 = 646 ms
To my understanding, the best approach out of the sample I tested was using Files.readAllLines, s.split(" "), and Integer.parseInt. Those three combinations produced the apparently fastest again, out of the sample I created and tested with At least maybe you'd change to the Integer.parseInt to help somewhat.
Note I used sources to help gain some sought after approaches and applied them to this problem/example. E.g. this blog post, this tutorial, and this awesome dude #Peter-Lawrey. Also, further improvements can always be made!
Also, the test.txt file:
my_name 15 00 29 101 1234
cool_id 11 00 01 10 010101
longer_id_name 1234
dynamic_er 1 2 3 4 5 6 7 8 9 10 11 12 123 1456 15689 555555555
(note: performance may greatly vary depending on file size!)
Related
Handle large data receive through socket TCP by mutiple Thread Java
I Have a server Socker, each seconds, Client will send data to server. Data is a String contain about 5000 lines, I want to split that data into 5 parts to handle by 5 threads at the same time. private void listening() { while (true) { try { clientSocket = serverSocket.accept(); System.out.println(clientSocket.getInetAddress()); BufferedReader os = new BufferedReader(new InputStreamReader(clientSocket.getInputStream())); new Thread(() -> { try { while (true) { String data = os.readLine(); } } catch (IOException e) { e.printStackTrace(); } }).start(); } catch (Exception e) { e.printStackTrace(); } } } Here is example code, What should I do to split data to 5 parts? Example Data: NVL01_1,20210624045425,172.67.216.146,5027,227.1.50.52,8870,212.133.114.73,2017 NVL01_1,20210624045425,193.25.63.53,6313,216.243.18.239,4445,227.236.233.188,2528 NVL01_1,20210624045425,111.176.240.164,2254,53.3.85.55,3829,72.195.203.220,8903 NVL01_1,20210624045425,223.224.123.173,1596,237.81.112.22,5669,25.193.178.6,5719 NVL01_1,20210624045425,178.89.46.197,489,140.87.132.177,4772,154.172.63.136,3045 NVL01_1,20210624045425,25.201.145.226,3004,234.138.243.22,6831,107.122.249.80,9609 NVL01_1,20210624045425,94.163.66.108,6041,37.190.105.119,9280,89.212.205.137,7483 NVL01_1,20210624045425,90.119.3.94,8881,96.137.66.26,7281,1.99.109.175,9525 NVL01_1,20210624045425,106.116.39.233,1280,196.62.122.91,1649,60.112.241.253,6697 NVL01_1,20210624045425,179.187.138.181,3870,62.38.25.158,4272,74.152.247.34,5220 NVL01_1,20210624045425,204.11.249.30,4749,234.133.240.8,7808,105.193.120.29,9638 NVL01_1,20210624045425,2.99.210.82,6924,206.153.6.165,7520,81.157.119.248,7638 NVL01_1,20210624045425,84.205.46.70,4275,188.189.94.143,4304,172.70.59.8,1226 NVL01_1,20210624045425,38.133.52.221,9577,87.183.254.244,9694,230.209.104.133,164 NVL01_1,20210624045425,13.43.85.59,2894,10.190.222.113,2948,96.155.28.151,9891 NVL01_1,20210624045425,16.79.32.72,7628,57.163.233.173,1,138.67.131.44,5079 NVL01_1,20210624045425,99.123.115.184,5113,197.56.206.97,9480,222.162.213.230,9564 NVL01_1,20210624045425,133.126.151.28,7437,3.80.234.183,5566,235.50.191.69,744 NVL01_1,20210624045425,71.86.226.128,5212,163.29.130.8,6954,160.182.239.31,1622 NVL01_1,20210624045425,145.78.71.65,2124,197.135.78.117,340,247.187.243.124,6136 NVL01_1,20210624045425,145.208.217.4,9493,8.138.165.8,8975,11.13.156.146,6828 NVL01_1,20210624045425,46.23.207.136,5328,151.197.27.17,3823,253.221.4.92,7230 NVL01_1,20210624045425,189.204.114.107,6709,44.199.81.116,5490,178.66.79.37,1437 NVL01_1,20210624045425,114.48.39.253,9602,27.38.239.223,1566,224.207.76.203,1899 NVL01_1,20210624045425,42.55.138.38,4812,51.93.10.2,7836,95.189.159.240,9574 NVL01_1,20210624045425,141.24.136.19,422,248.144.61.220,2427,138.88.193.240,2284 NVL01_1,20210624045425,146.176.9.78,6852,198.41.131.88,1094,227.242.134.106,5715 NVL01_1,20210624045425,134.47.77.168,7825,90.1.25.81,9125,175.143.184.94,5291 NVL01_1,20210624045425,131.180.238.244,7408,20.87.233.210,592,148.178.232.143,2782 NVL01_1,20210624045425,127.144.113.136,1375,197.9.246.61,7113,181.163.124.51,4290 NVL01_1,20210624045425,131.204.107.100,7185,192.181.253.8,2237,207.147.69.181,4239 NVL01_1,20210624045425,123.28.117.19,5432,89.11.193.31,9282,34.193.75.180,8747 NVL01_1,20210624045425,96.24.44.203,9186,73.65.43.110,4013,174.193.2.241,8762 NVL01_1,20210624045425,164.248.38.5,3122,245.59.114.8,5506,231.212.210.94,8837 NVL01_1,20210624045425,144.86.166.14,8583,123.127.122.39,8625,6.132.112.158,1653 NVL01_1,20210624045425,195.6.162.254,3597,24.218.41.173,1357,24.55.15.35,921 NVL01_1,20210624045425,75.13.49.219,9779,9.202.212.168,2309,11.142.118.22,1955 NVL01_1,20210624045425,245.132.44.122,9659,12.116.75.191,7258,88.91.180.73,2457 NVL01_1,20210624045425,223.31.193.225,5257,194.245.37.73,4567,197.134.216.13,6327 NVL01_1,20210624045425,251.30.222.188,4178,106.83.17.52,4045,142.99.100.174,6164 NVL01_1,20210624045425,209.115.15.248,9416,124.213.26.22,128,145.6.19.210,2801 NVL01_1,20210624045425,189.174.30.164,7052,24.191.53.184,8172,20.57.226.30,8362 NVL01_1,20210624045425,235.148.200.174,5072,162.253.12.169,7542,205.85.11.196,553 NVL01_1,20210624045425,164.121.163.241,9549,60.225.45.42,7108,255.147.26.90,7637 NVL01_1,20210624045425,145.3.148.142,7128,76.29.166.83,6432,152.25.4.242,1605 NVL01_1,20210624045425,194.170.50.219,6973,229.63.113.168,5698,164.5.6.101,6650 NVL01_1,20210624045425,39.184.47.229,367,17.180.188.224,5841,70.42.225.241,6074 NVL01_1,20210624045425,36.62.110.27,2587,105.252.86.145,7262,57.63.203.247,4518 NVL01_1,20210624045425,225.173.252.217,4665,115.177.84.223,4614,62.203.148.102,7514 NVL01_1,20210624045425,146.128.170.11,2411,76.187.243.147,4396,224.224.170.32,4872 NVL01_1,20210624045425,27.209.151.174,4614,0.125.68.119,2427,39.208.125.100,940 NVL01_1,20210624045425,88.90.208.193,7722,35.102.255.5,3604,214.45.25.189,7213 NVL01_1,20210624045425,96.33.115.231,5202,128.192.0.70,4048,160.221.24.37,3806 NVL01_1,20210624045425,84.26.118.109,2940,109.36.178.60,3276,170.183.57.80,6159 NVL01_1,20210624045425,225.67.85.90,3034,73.62.181.134,291,97.92.65.165,6845 NVL01_1,20210624045425,160.177.222.98,5610,134.70.105.214,65,24.69.80.75,5193 NVL01_1,20210624045425,142.49.198.59,7820,176.83.196.180,2107,40.68.245.29,9761 NVL01_1,20210624045425,59.199.111.242,734,222.236.118.31,7964,210.83.178.184,4373 NVL01_1,20210624045425,115.106.166.229,5409,77.171.38.150,2611,4.217.213.148,9342 NVL01_1,20210624045425,18.54.5.157,9803,48.47.15.108,4348,224.211.21.208,6431 NVL01_1,20210624045425,135.21.210.96,3068,203.5.250.83,9397,221.89.166.128,3374 NVL01_1,20210624045425,191.223.45.133,9746,227.252.45.227,2955,105.233.104.84,4350 NVL01_1,20210624045425,113.39.211.171,2688,63.230.236.139,2083,213.155.51.185,1973 NVL01_1,20210624045425,92.242.126.24,7434,30.44.168.146,3950,177.251.17.214,7967 NVL01_1,20210624045425,194.134.48.232,8858,14.13.21.182,9196,236.92.11.13,9344 NVL01_1,20210624045425,130.3.48.196,9380,112.89.224.216,4645,157.199.7.200,1790 NVL01_1,20210624045425,229.36.230.48,8815,116.98.169.138,505,134.232.82.65,727 NVL01_1,20210624045425,67.133.95.171,7594,214.33.143.109,5649,71.73.166.217,3153 NVL01_1,20210624045425,225.153.10.77,5447,139.209.199.128,2845,71.108.112.231,4144 NVL01_1,20210624045425,108.253.199.77,3088,203.35.58.102,8689,138.78.85.194,7954 NVL01_1,20210624045425,48.242.189.77,49,56.20.207.122,9542,179.159.117.240,9634 NVL01_1,20210624045425,47.46.208.195,9766,145.154.85.14,2952,189.187.53.186,7724 NVL01_1,20210624045425,95.124.222.197,9549,227.219.232.255,4794,161.166.17.242,4141
How about using data.split() and then creating a thread for each data like this? //this will split the data in 5 where the text is marked by /'/ String[] splitdata = data.split("/'/", 5); for(int i=0;i<5;i++) startThread(splitdata[5]); to call public void startThread(String data){ //starts the thread with the split data } here is an example: String data="...NVL01_...5719 /'/NVL01_...3045... etc." String[] splitdata = data.split("/'/", 5); System.out.println(splitdata[0]) //...NVL01_...5719 System.out.println(splitdata[1]) //...NVL01_...3045 etc. you would just need to put some kind of sign where you want the string to be split before sending the data.
With this example, you could do the work. public static String[] splitFive(String data) { int factor = 1; String [] parts = new String[5]; if(data.length() >= 5){ // factor = data.length() / 5; parts[0] = data.substring(0, factor); parts[1] = data.substring(factor, factor * 2); parts[2] = data.substring(factor*2, factor * 3); parts[3] = data.substring(factor*3, factor * 4); parts[4] = data.substring(factor*4); } else { for(int i = 0; i < data.length(); i++){ parts[i] = String.valueOf(data.charAt(i)); } } return parts; } String [] result = splitFive("1234"); String [] result2 = splitFive("12345678901234567890--"); Will return: field String[] result = String[5] { "1", "2", "3", "4", null } field String[] result2 = String[5] { "1234", "5678", "9012", "3456", "7890--" } Edited example method to work with lines: public static List[] splitFive(String data) { List [] parts = new List [5]; String [] allLines = data.split("\n"); int factor = 1; List<String> allLinesList = Arrays.asList(allLines); if(allLines.length >= 5){ factor = allLines.length / 5; parts[0] = allLinesList.subList(0, factor); parts[1] = allLinesList.subList(factor, factor * 2); parts[2] = allLinesList.subList(factor*2, factor * 3); parts[3] = allLinesList.subList(factor*3, factor * 4); parts[4] = allLinesList.subList(factor*4, allLinesList.size()); } else { for(int i = 0; i < allLines.length ; i++){ parts[i] = Collections.singletonList(allLinesList.get(i)); } } return parts; } StringBuilder sb = new StringBuilder(); for(int i = 0; i <10; i++){ sb.append("Line-"+i+"\n"); } List [] result = splitFive("1234\n4567\n464646464654654\n"); List [] result2 = splitFive(sb.toString()); This will return field List[] result = List[5] { [1234], [4567], [464646464654654], null, null } field List[] result2 = List[5] { [Line-0, Line-1], [Line-2, Line-3], [Line-4, Line-5], [Line-6, Line-7], [Line-8, Line-9] }
How to compare two set then filter to new set with combination string?
I'm build some short code for compare 2 hashset. SET 1 = noRek : [1234567892, 1234567891, 1234567890] SET 2 = Source : [1234567890U0113, 1234567894B0111, 1234567890U0112, 1234567891B0111, 1234567890U0115, 1234567890U0114, 1234567892B0113, 1234567893B0111, 1234567890U0111, 1234567890B0111, 1234567892B0112, 1234567892B0111] public class diff { public static void main(String args[]) { String filename = "C:\\abc.txt"; String filename2 = "C:\\xyz.txt"; HashSet<String> al = new HashSet<String>(); HashSet<String> al1 = new HashSet<String>(); HashSet<String> source = new HashSet<String>(); HashSet<String> noRek = new HashSet<String>(); HashSet<String> diff1 = new HashSet<String>(); HashSet<String> diff2 = new HashSet<String>(); String str = null; String str2 = null; Integer digitRek = 10; Integer digitTransaksi = 15; //GET REKDATA FROM TARGET try { String message = new Scanner(new File(filename2)).useDelimiter("\\Z").next(); for (int i = 0; i < message.length(); i += digitRek) { noRek.add(message.substring(i, Math.min(i + digitRek, message.length()))); } System.out.println("noRek : " + noRek); } catch (Exception e) { e.printStackTrace(); } try { String message2 = new Scanner(new File(filename)).useDelimiter("\\Z").next(); for (int i = 0; i < message2.length(); i += digitTransaksi) { source.add(message2.substring(i, Math.min(i + digitTransaksi, message2.length()))); } System.out.println("Source : " + source); } catch (Exception e) { e.printStackTrace(); } for (String str3 : source) { if (source.contains(noRek.substring(digitRek)) { diff1.add(str3); } } System.out.println("Final : " + diff1); } I excpet the output of the set diff1 is like this SET 3 = [1234567890U0111, 1234567890U0112, 1234567890U0113,1234567890U0114, 1234567890U0115, 1234567890B0111, 1234567891B0111, 1234567892B0113, 1234567892B0112, 1234567892B0111] but actual output is same like SET 2. In simple way I need compare SET 2 with combination, first 10 digit is account number, then next charachter 1 digit is code, then the rest of number is auto generated. That's mean the length combination SET 2 is 15 digit, and combination SET 1 is 10 digit, then set 1 is data of account number, I need get all transaction from account number in set 2. SET 1 is data all of account and SET 2 is data of transaction combination
You can solve this by using stream and filter Set<String> diff1 = source.stream().filter(str -> { if (str.length() > 10) { String account = str.substring(0, 10); return noRek.contains(account); } return false; }).collect(Collectors.toSet());
Implementing Elimination of Immediate Left-Recursion in Java
I am working on implementing a generic code to solve left recursion problem in a grammar using java so my code is working as follows I am reading an input like this as each line goes to the next line: E E+T|T T T*F|F F (E)|id|number and the required output is supposed to be like this one : E->[TE'] T->[FT'] F->[(E), id, number] E'->[+TE', !] T'->[*FT', !] I wrote that code which is storing input in Arraylists to iterate over them and produce the output: import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; import java.util.ArrayList; public class IleftRecursion { //storing each line in its corresponding Arraylist static ArrayList<String> leftRules = new ArrayList<>(); static ArrayList<String> rightRules = new ArrayList<>(); public static void read_file(String file) throws IOException { FileReader in = new FileReader(file); BufferedReader br = new BufferedReader(in); String line; while ((line = br.readLine()) != null) { leftRules.add(line); rightRules.add(br.readLine()); } br.close(); } public static void ss() { for (int i = 0; i < leftRules.size(); i++) { for (int j = 1; j <= i - 1; j++) { //splitting inputs on bars "|" to iterate through them for (String x : rightRules.get(i).split("\\|")) { if (x.contains(leftRules.get(j))) { String f = ""; String ff = ""; for (int k=0; k<rightRules.get(k).split("\\|").length;k++) { f = x; f = f.replaceAll(leftRules.get(i), rightRules.get(k).split("\\|")[k]); ff += f; } rightRules.remove(i); rightRules.add(i, ff); } } } //Recursive or Not boolean boolean isRec = false; for (String z : rightRules.get(i).split("\\|")) { if (z.startsWith(leftRules.get(i))) { isRec = true; break; } } if (isRec) { String a = ""; String b = ""; for (String s : rightRules.get(i).split("\\|")) { if (s.startsWith(leftRules.get(i))) { b += s.replaceAll(leftRules.get(i), "") + leftRules.get(i) + "',"; } else { a += s + leftRules.get(i) + "'"; } } b += "!"; if(a.length()>=1) a.substring(1, a.length() - 1); rightRules.add(i, a); rightRules.add(i + 1, b); leftRules.add(leftRules.get(i) + "'"); } } } public static void main(String[] args) throws IOException { read_file("Sample.in"); ss(); for (int i=0;i<leftRules.size();i++) { System.out.print(leftRules.get(i)+"->"); System.out.println("["+rightRules.get(i)+"]"); } } } I debugged the code many times trying to figure out why Am I getting output like this E->[TE'] T->[+TE',!] F->[T] E'->[T*F] Which is missing One rule and also not all the new productions generated in the right way but I couldn't fix could anyone help me through that ?
How to find similar lines in two text files irrespective of the line number at which they occur
I am trying to open two text files and find similar lines in them. My code is correctly reading all the lines from both the text files. I have used nested for loops to compare line1 of first text file with all lines of second text file and so on. However, it is only detecting similar lines which have same line number, (eg. line 1 of txt1 is cc cc cc and line 1 of txt2 is cc cc cc, then it correctly finds and prints it), but it doesn't detect same lines on different line numbers in those files. import java.io.*; import java.util.*; public class FeatureSelection500 { public static void main(String[] args) throws FileNotFoundException, IOException { // TODO code application logic here File f1 = new File("E://implementation1/practise/ComUpdatusPS.exe.hex-04-ngrams-Freq.txt"); File f2 = new File("E://implementation1/practise/top-300features.txt"); Scanner scan1 = new Scanner(f1); Scanner scan2 = new Scanner(f2); int i = 1; List<String> txtFileOne = new ArrayList<String>(); List<String> txtFileTwo = new ArrayList<String>(); while (scan1.hasNext()) { txtFileOne.add(scan1.nextLine()); } while (scan2.hasNext()) { txtFileTwo.add(scan2.nextLine()); } /* for(String ot : txtFileTwo ) { for (String outPut : txtFileOne) { // if (txtFileTwo.contains(outPut)) if(outPut.equals(ot)) { System.out.print(i + " "); System.out.println(outPut); i++; } } } */ for (int j = 0; j < txtFileTwo.size(); j++) { String fsl = txtFileTwo.get(j); // System.out.println(fileContentSingleLine); for (int z = 0; z < 600; z++) // z < txtFileOne.size() { String s = txtFileOne.get(z); // System.out.println(fsl+"\t \t"+ s); if (fsl.equals(s)) { System.out.println(fsl + "\t \t" + s); // my line // System.out.println(fsl); } else { continue; } } } } }
I made your code look nicer, you're welcome :) Anyway, I don't understand that you get that bug. It runs through all of the list2 for every line in the list1... import java.io.*; import java.util.*; public class FeatureSelection500 { public static void main(String[] args) throws FileNotFoundException, IOException { // TODO code application logic here File file1 = new File("E://implementation1/practise/ComUpdatusPS.exe.hex-04-ngrams-Freq.txt"); File file2 = new File("E://implementation1/practise/top-300features.txt"); Scanner scan1 = new Scanner(file1); Scanner scan2 = new Scanner(file2); List<String> txtFile1 = new ArrayList<String>(); List<String> txtFile2 = new ArrayList<String>(); while (scan1.hasNext()) { txtFile1.add(scan1.nextLine()); } while (scan2.hasNext()) { txtFile2.add(scan2.nextLine()); } for (int i = 0; i < txtFile2.size(); i++) { String lineI = txtFile2.get(i); // System.out.println(fileContentSingleLine); for (int j = 0; j < txtFile1.size(); j++){ // z < txtFileOne.size( String lineJ = txtFile1.get(j); // System.out.println(fsl+"\t \t"+ s); if (lineI.equals(lineJ)) { System.out.println(lineI + "\t \t" + lineJ); // my line // System.out.println(fsl); } } } } }
I don't see any problem with your code. Even the block you commented is absolutely fine. Since, you are doing equals() you should make sure that you have same text (same case) in the two files for them to be able to satisfy the condition successfully. for(String ot : txtFileTwo ) { for (String outPut : txtFileOne) { if(outPut.equals(ot)) /* Check Here */ { /* Please note that here i will not give you line number, it will just tell you the number of matches in the two files */ System.out.print(i + " "); System.out.println(outPut); i++; } } }
Storing Data from File into an Array
So I have a text file with items like look like this: 350279 1 11:54 107.15 350280 3 11:55 81.27 350281 2 11:57 82.11 350282 0 11:58 92.43 350283 3 11:59 86.11 I'm trying to create arrays from those values, in which the first values of each line are in an array, the second values of each line are in an array, and so on. This is all the code I have right now, and I can't seem to figure out how to do it. package sales; import java.io.File; import java.io.FileNotFoundException; import java.util.Scanner; public class Sales { public static void main (String[] args) throws FileNotFoundException { Scanner reader = new Scanner(new File("sales.txt")); int[] transID = new int[reader.nextInt()]; int[] transCode = new int[reader.nextInt()]; String[] time = new String[reader.next()]; double[] trasAmount = new double[reader.hasNextDouble()]; } }
It's difficult to build an array this way, because Arrays have fixed size... you need to know how many elements they have. If you use a List instead, you don't have to worry about knowing the number of elements in advance. Try this (note: there is no error checking here!): public static void main (String[] args) throws FileNotFoundException { Scanner reader = new Scanner(new File("sales.txt")); List<Integer> ids = new LinkedList<>(); List<Integer> codes = new LinkedList<>(); List<String> times = new LinkedList<>(); List<Double> amounts = new LinkedList<>(); // Load elements into Lists. Note: you can just use the lists if you want while(reader.hasNext()) { ids.add(reader.nextInt()); codes.add(reader.nextInt()); times.add(reader.next()); amounts.add(reader.nextDouble()); } // Create arrays int[] idArray = new int[ids.size()]; int[] codesArray = new int[codes.size()]; String[] timesArray = new String[times.size()]; double[] amountsArray = new double[amounts.size()]; // Load elements into arrays int index = 0; for(Integer i : ids) { idArray[index++] = i; } index = 0; for(Integer i : codes) { codesArray[index++] = i; } index = 0; for(String i : times) { timesArray[index++] = i; } index = 0; for(Double i : ids) { amountsArray[index++] = i; } }
Use Array list because Arrays have fixed size and using Arraylist you add the elements dynamically Scanner reader = new Scanner(new File("test.txt")); List<Integer> transID = new ArrayList<Integer>(); List<Integer> transCode = new ArrayList<Integer>(); List<String> time= new ArrayList<String>(); List<Double> trasAmount = new ArrayList<Double>(); while(reader.hasNext() ) { transID.add(reader.nextInt()); transCode.add(reader.nextInt()); time.add(reader.next()); trasAmount.add(reader.nextDouble()); } System.out.println(transID.toString()); System.out.println(transCode.toString()); System.out.println(time.toString()); System.out.println(trasAmount.toString()); Output of the above code transID [350279, 350280, 350281, 350282, 350283] transCode [1, 3, 2, 0, 3] time [11:54, 11:55, 11:57, 11:58, 11:59] trasAmount [107.15, 81.27, 82.11, 92.43, 86.11]
You'll need a while loop to check for input. Since not all inputs are integers you might do something like: while(reader.hasNextLine()){ //checks to make sure there's still a line to be read in the file String line=reader.nextLine(); //record that next line String[] values=line.split(" "); //split on spaces if(values.length==4){ int val1=Integer.parseInt(values[0]); //parse values int val2=Integer.parseInt(values[1]); String val3=values[2]; double val4=Double.parseDouble(values[3]); //add these values to your arrays. Might have to "count" the number of lines on a first pass and then run through a second time... I've been using the collections framework for too long to remember exactly how to work with arrays in java when you don't know the size right off the bat. } }
In addition to my comment here are 3 ways how you cant do it read into single arrays int size = 2; // first allocate some memory for each of your arrays int[] transID = new int[size]; int[] transCode = new int[size]; String[] time = new String[size]; double[] trasAmount = new double[size]; Scanner reader = new Scanner(new File("sales.txt")); // keep track of how many elements you have read int i = 0; // start reading and continue untill there is no more left to read while(reader.hasNext()) { // since array size is fixed and you don't know how many line your file will have // you have to reallocate your arrays when they have reached their maximum capacity if(i == size) { // increase capacity by 5 size += 5; // reallocate temp arrays int[] tmp1 = new int[size]; int[] tmp2 = new int[size]; String[] tmp3 = new String[size]; double[] tmp4 = new double[size]; // copy content to new allocated memory System.arraycopy(transID, 0, tmp1, 0, transID.length); System.arraycopy(transCode, 0, tmp2, 0, transCode.length); System.arraycopy(time, 0, tmp3, 0, time.length); System.arraycopy(trasAmount, 0, tmp4, 0, trasAmount.length); // reference to the new memory by your old old arrays transID = tmp1; transCode = tmp2; time = tmp3; trasAmount = tmp4; } // read transID[i] = Integer.parseInt(reader.next()); transCode[i] = Integer.parseInt(reader.next()); time[i] = reader.next(); trasAmount[i] = Double.parseDouble(reader.next()); // increment for next line i++; } reader.close(); for(int j = 0; j < i; j++) { System.out.println("" + j + ": " + transIDList.get(j) + ", " + transCodeList.get(j) + ", " + timeList.get(j) + ", " + trasAmountList.get(j)); } as you see this is a lot of code. Better you use lists so get rid of the overhead of reallocation and copying (at leas in your own code) read into single lists // instanciate your lists List<Integer> transIDList = new ArrayList<>(); List<Integer> transCodeList = new ArrayList<>(); List<String> timeList = new ArrayList<>(); List<Double> trasAmountList = new ArrayList<>(); reader = new Scanner(new File("sales.txt")); int i = 0; while(reader.hasNext()) { // read transIDList.add(Integer.parseInt(reader.next())); transCodeList.add(Integer.parseInt(reader.next())); timeList.add(reader.next()); trasAmountList.add(Double.parseDouble(reader.next())); i++; } reader.close(); for(int j = 0; j < i; j++) { System.out.println("" + j + ": " + transIDList.get(j) + ", " + transCodeList.get(j) + ", " + timeList.get(j) + ", " + trasAmountList.get(j)); } You see here how small the code went? But but it still can get better... A line in the sales.txt file seem to constitute data elements of some entity, why not put them in an object ? for that you may write a class named Trans, some think like this: class Trans { public int transID; public int transCode; public String time; public double trasAmount; #Override public String toString() { return transID + ", " + transCode + ", " + time + ", " + trasAmount; } } Then you can use this class to hold the data you read from your file and put each object of that class in a list. reading into a list of objects reader = new Scanner(new File("sales.txt")); List<Trans> transList = new ArrayList<>(); int i = 0; while(reader.hasNext()) { Trans trans = new Trans(); trans.transID = Integer.parseInt(reader.next()); trans.transCode = Integer.parseInt(reader.next()); trans.time = reader.next(); trans.trasAmount = Double.parseDouble(reader.next()); transList.add(trans); i++; } reader.close(); for(Trans trans : transList) { System.out.println("" + i++ + ": " + trans); } Output of all 3 methods 0: 350279, 1, 11:54, 107.15 1: 350280, 3, 11:55, 81.27 2: 350281, 2, 11:57, 82.11 3: 350282, 0, 11:58, 92.43 4: 350283, 3, 11:59, 86.11
Here is a sample code to read the values from the file and write into an array. Sample code has logic for int array, you can replicate it for other array types as well. package sales; import java.io.BufferedReader; import java.io.DataInputStream; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStreamReader; public class Sales { public static void main (String[] args) throws IOException { FileInputStream fstream = new FileInputStream("sales.txt"); BufferedReader br = new BufferedReader(new InputStreamReader(fstream)); String strLine; while ((strLine = br.readLine()) != null) { String[] tokens = strLine.split(" "); int[] transID = convertStringToIntArray(tokens[0]); for(int i = 0 ; i < transID.length ; i++ ) System.out.print(transID[i]); } } /** function to convert a string to integer array * #param str * #return */ private static int[] convertStringToIntArray(String str) { int intArray[] = new int[str.length()]; for (int i = 0; i < str.length(); i++) { intArray[i] = Character.digit(str.charAt(i), 10); } return intArray; } }