process data csv with Java 8 and streams

process data csv with Java 8 and streams - java

I'm learning java 8 and im trying to process a csv file in java;
List<Catalogo> catalogos = new ArrayList<>();
try (Stream<String> lines = Files.lines(Paths.get("src\\main\\resources\\productos.csv"), Charset.forName("Cp1252"))) {
List<String[]> data = lines.map(s -> s.split(","))
.collect(Collectors.toList());
createCatalog(catalogos, data);
catalogos.forEach(System.out::println);
} catch (IOException e) {
e.printStackTrace();
}
}
public static void createCatalog(List<Catalogo> catalogos, List<String[]> data) {
for (String[] x : data) {
for (int i = 0; i < x.length; i++) {
Catalogo catalogo = new Catalogo();
catalogo.setCodigo(x[0]);
catalogo.setProducto(x[1]);
catalogo.setTipo(x[2]);
catalogo.setPrecio(x[3]);
catalogo.setInventario(x[4]);
catalogos.add(catalogo);
}
}
}
I would like to know if it's possible to improve this code, I did not like it the way I have done it;

You can directly map to your object using a constructor that accepts all your attributes such as :
try...
List<Catalogo> catalogos = lines.map(s -> s.split(","))
.map(s -> new Catalogo(s[0], s[1], s[2], s[3], s[4]))
.collect(Collectors.toList());
catch...
where the constructor based on existing code would be of signature:
Catalogo(String codigo, String producto, String tipo, String precio, String inventario)

Related

How to delete all duplicates files in a directory?? Java

How could I write a code to be able to delete exactly the duplicates
that I get previously with this code.?? please be specific when
answering as I am new to java.I have very basic knowledge of java.
private static MessageDigest messageDigest;
static {
try {
messageDigest = MessageDigest.getInstance("SHA-512");
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException("cannot initialize SHA-512 hash function", e);
}
}
public static void findDuplicatedFiles(Map<String, List<String>> lists, File directory) {
for (File child : directory.listFiles()) {
if (child.isDirectory()) {
findDuplicatedFiles(lists, child);
} else {
try {
FileInputStream fileInput = new FileInputStream(child);
byte fileData[] = new byte[(int) child.length()];
fileInput.read(data);
fileInput.close();
String uniqueFileHash = new BigInteger(1, md.digest(fileData)).toString(16);
List<String> list = lists.get(uniqueFileHash);
if (list == null) {
list = new LinkedList<String>();
lists.put(uniqueFileHash, list);
}
list.add(child.getAbsolutePath());
} catch (IOException e) {
throw new RuntimeException("cannot read file " + child.getAbsolutePath(), e);
}
}
}
}
Map<String, List<String>> lists = new HashMap<String, List<String>>();
FindDuplicates.findDuplicateFiles(lists, dir);
for (List<String> list : lists.values()) {
if (list.size() > 1) {
System.out.println("\n");
for (String file : list) {
System.out.println(file);
}
}
}
System.out.println("\n");

Do not read the entire contents of the file into memory. The whole point of an InputStream is that you can read small, manageable chunks of data, so you don’t have to use a great deal of memory.
Imagine if you were trying to check a file that’s one gigabyte in size. By creating a byte array to hold the entire content, you have forced your program to use a gigabyte of RAM. (If the file were two gigabytes or larger, you wouldn’t be able to allocate the byte array at all, since an array may not have more than 2³¹-1 elements.)
The easiest way to compute the hash of a file’s contents is to copy the file to a DigestOutputStream, which is an OutputStream that makes use of an existing MessageDigest:
messageDigest.reset();
try (DigestOutputStream stream = new DigestOutputStream(
OutputStream.nullOutputStream(), messageDigest)) {
Files.copy(child.toPath(), stream);
}
String uniqueFileHash = new BigInteger(1, messageDigest.digest());

Scanning directories is easier with NIO Path / Files class because it avoids awkward recursion of File class, and it is much quicker for deeper directory trees.
Here is an example scanner which returns a Stream of duplicates - that is where each item in the stream is a List<Path> - a group of TWO or more identical files.
// Scan a directory and returns Stream of List<Path> where each list has 2 or more duplicates
static Stream<List<Path>> findDuplicates(Path dir) throws IOException {
Map<Long, List<Path>> candidates = new HashMap<>();
BiPredicate<Path, BasicFileAttributes> biPredicate = (p,a)->a.isRegularFile()
&& candidates.computeIfAbsent(Long.valueOf(a.size())
, k -> new ArrayList<>()).add(p);
try(var stream = Files.find(dir, Integer.MAX_VALUE, biPredicate)) {
stream.count();
}
Predicate<? super List<Path>> twoOrMore = paths -> paths.size() > 1;
return candidates.values().stream()
.filter(twoOrMore)
.flatMap(Duplicate::duplicateChecker)
.filter(twoOrMore);
}
The above code starts by collating candidates of same file size, then uses a flatMap operation to compare all those candidates to get the exact matches where the files are identical in each List<Path>:
// Checks possible list of duplicates, and returns stream of definite duplicates
private static Stream<List<Path>> duplicateChecker(List<Path> sameLenPaths) {
List<List<Path>> groups = new ArrayList<>();
try {
for (Path p : sameLenPaths) {
List<Path> match = null;
for (List<Path> g : groups) {
Path prev = g.get(0);
if(Files.mismatch(prev, p) < 0) {
match = g;
break;
}
}
if (match == null)
groups.add(match = new ArrayList<>());
match.add(p);
}
} catch(IOException io) {
throw new UncheckedIOException(io);
}
return groups.stream();
}
Finally an example launcher:
public static void main(String[] args) throws IOException {
Path dir = Path.of(args[0]);
Stream<List<Path>> duplicates = findDuplicates(dir);
long count = duplicates.peek(System.out::println).count();
System.out.println("Found "+count+" groups of duplicate files in: "+dir);
}
You will need to process list of duplicate files using Files.delete - I've not added Files.delete at the end so that you can check the results before deciding to delete them.
// findDuplicates(dir).flatMap(List::stream).forEach(dup -> {
// try {
// Files.delete(dup);
// } catch(IOException io) {
// throw new UncheckedIOException(io);
// }
// });

How can I read all files line by line from a directory by using Streams?

I have a directory called Files and it has many files.I want to read these Files line by line and store them an
List<List<String>> .
./Files
../1.txt
../2.txt
../3.txt
..
..
it goes like that.
private List<List<String>> records = new ArrayList<>();
List<Path> filesInFolder = Files.list(Paths.get("input"))
.filter(Files::isRegularFile)
.collect(Collectors.toList());
records = Files.lines(Paths.get("input/1.txt"))
.map(row -> Arrays.asList(row.split(space)))
.collect(Collectors.toList());

The logic basically is like
List<List<String>> records = Files.list(Paths.get("input"))
.filter(Files::isRegularFile)
.flatMap(path -> Files.lines(path)
.map(row -> Arrays.asList(row.split(" "))))
.collect(Collectors.toList());
But you are required to catch the IOException potentially thrown by Files.lines. Further, the stream returned by Files.list should be closed to release the associated resources as soon as possible.
List<List<String>> records; // don't pre-initialize
try(Stream<Path> files = Files.list(Paths.get("input"))) {
records = files.filter(Files::isRegularFile)
.flatMap(path -> {
try {
return Files.lines(path)
.map(row -> Arrays.asList(row.split(" ")));
} catch (IOException ex) { throw new UncheckedIOException(ex); }
})
.collect(Collectors.toList());
}
catch(IOException|UncheckedIOException ex) {
// log the error
// and if you want a fall-back:
records = Collections.emptyList();
}
Note that the streams returned by Files.lines used with flatMap are correctly closed automatically, as documented:
Each mapped stream is closed after its contents have been placed into this stream.
It’s also possible to move the map step from the inner stream to the outer:
List<List<String>> records; // don't pre-initialize
try(Stream<Path> files = Files.list(Paths.get("E:\\projects\\nbMJ\\src\\sub"))) {
records = files.filter(Files::isRegularFile)
.flatMap(path -> {
try { return Files.lines(path); }
catch (IOException ex) { throw new UncheckedIOException(ex); }
})
.map(row -> Arrays.asList(row.split(" ")))
.collect(Collectors.toList());
}
catch(IOException|UncheckedIOException ex) {
// log the error
// and if you want a fall-back:
records = Collections.emptyList();
}

Convert set to json array using java 8

Following is my code can it be optimized for java 8 and can it be more efficient?
public String LanguageString(Set<Locale> languageSet) throws Exception {
JSONObject json = new JSONObject();
JSONObject tempj = new JSONObject();
JSONArray jArr = new JSONArray();
try {
for (Locale locale : languageSet) {
if (locale != null) {
tempj = new JSONObject();
tempj.put("lcode", locale.toLanguageTag());
tempj.put("ldisplay", locale.getDisplayName());
jArr.put(tempj);
}
}
json.put("root", jArr);
} catch (JSONException e) {
//
}
return json.toString();
}

If you want to use Java 8 and Stream API you can use stream, map and reduce to create your final JSONObject, e.g.
public static String languageStringUsingStream(Set<Locale> locales) {
return new JSONObject()
.put("root", locales.stream()
.map(locale -> new JSONObject()
.put("lcode", locale.toLanguageTag())
.put("ldisplay", locale.getDisplayName(Locale.ENGLISH))
)
.reduce(new JSONArray(), JSONArray::put, (a, b) -> a))
.toString();
}
Here you can find a complete example:
https://gist.github.com/wololock/27bd296fc894f6f4594f997057218fb3

Java 8 streams, lambdas

I am trying to learn how to utilize Java 8 features(such as lambdas and streams) in my daily programming, since it makes for much cleaner code.
Here's what I am currently working on:
I get a string stream from a local file with some data which I turn into objects later. The input file structure looks something like this:
Airport name; Country; Continent; some number;
And my code looks like this:
public class AirportConsumer implements AirportAPI {
List<Airport> airports = new ArrayList<Airport>();
#Override
public Stream<Airport> getAirports() {
Stream<String> stream = null;
try {
stream = Files.lines(Paths.get("resources/planes.txt"));
stream.forEach(line -> createAirport(line));
} catch (IOException e) {
e.printStackTrace();
}
return airports.stream();
}
public void createAirport(String line) {
String airport, country, continent;
int length;
airport = line.substring(0, line.indexOf(';')).trim();
line = line.replace(airport + ";", "");
country = line.substring(0,line.indexOf(';')).trim();
line = line.replace(country + ";", "");
continent = line.substring(0,line.indexOf(';')).trim();
line = line.replace(continent + ";", "");
length = Integer.parseInt(line.substring(0,line.indexOf(';')).trim());
airports.add(new Airport(airport, country, continent, length));
}
}
And in my main class I iterate over the object stream and print out the results:
public class Main {
public void toString(Airport t){
System.out.println(t.getName() + " " + t.getContinent());
}
public static void main(String[] args) throws IOException {
Main m = new Main();
m.whatever();
}
private void whatever() throws IOException {
AirportAPI k = new AirportConsumer();
Stream<Airport> s;
s = k.getAirports();
s.forEach(this::toString);
}
}
My question is this: How can I optimize this code, so I don't have to parse the lines from the file separately, but instead create a stream of objects Airport straight from the source file? Or is this the extent in which I can do this?

You need to use map() to transform the data as it comes past.
Files.lines(Paths.get("resources/planes.txt"))
.map(line -> createAirport(line));
This will return a Stream<Airport> - if you want to return a List, then you'll need to use the collect method at the end.
This approach is also stateless, which means you won't need the instance-level airports value.
You'll need to update your createAirport method to return something:
public Airport createAirport(String line) {
String airport = line.substring(0, line.indexOf(';')).trim();
line = line.replace(airport + ";", "");
String country = line.substring(0,line.indexOf(';')).trim();
line = line.replace(country + ";", "");
String continent = line.substring(0,line.indexOf(';')).trim();
line = line.replace(continent + ";", "");
int length = Integer.parseInt(line.substring(0,line.indexOf(';')).trim());
return new Airport(airport, country, continent, length);
}
If you're looking for a more functional approach to your code, you may want to consider a rewrite of createAirport so it doesn't mutate line. Builders are also nice for this kind of thing.
public Airport createAirport(final String line) {
final String[] fields = line.split(";");
return new Airport(fields[0].trim(),
fields[1].trim(),
fields[2].trim(),
Integer.parseInt(fields[3].trim()));
}
Throwing it all together, your class now looks like this.
public class AirportConsumer implements AirportAPI {
#Override
public Stream<Airport> getAirports() {
Stream<String> stream = null;
try {
stream = Files.lines(Paths.get("resources/planes.txt"))
.map(line -> createAirport(line));
} catch (IOException e) {
stream = Stream.empty();
e.printStackTrace();
}
return stream;
}
private Airport createAirport(final String line) {
final String[] fields = line.split(";");
return new Airport(fields[0].trim(),
fields[1].trim(),
fields[2].trim(),
Integer.parseInt(fields[3].trim()));
}
}

The code posted by Steve looks great. But there are still two places can be improved:
1, How to split a string.
2, It may cause issue if the people forget or don't know to close the stream created by calling getAirports() method. So it's better to finish the task(toList() or whatever) in place.
Here is code by abacus-common
try(Reader reader = IOUtil.createBufferedReader(file)) {
List<Airport> airportList = Stream.of(reader).map(line -> {
String[] strs = Splitter.with(";").trim(true).splitToArray(line);
return Airport(strs[0], strs[1], strs[2], Integer.valueOf(strs[3]));
}).toList();
} catch (IOException e) {
throw new RuntimeException(e);
}
// Or By the Try:
List<Airport> airportList = Try.stream(file).call(s -> s.map(line -> {
String[] strs = Splitter.with(";").trim(true).splitToArray(line);
return Airport(strs[0], strs[1], strs[2], Integer.valueOf(strs[3]));
}).toList())
Disclosure： I'm the developer of abacus-common.

IllegalStateException with StreamSupplier

I have the following code to do different things in one stream.
private void getBuildInformation(Stream<String> lines)
{
Supplier<Stream<String>> streamSupplier = () -> lines;
String buildNumber = null;
String scmRevision = null;
String timestamp = null;
String buildTag = null;
Optional<String> hasBuildNumber = streamSupplier.get().filter(s -> s.contains(LogProps.PLM_BUILD)).findFirst();
if (hasBuildNumber.isPresent())
{
buildNumber = hasBuildNumber.get();
String[] temp = buildNumber.split("=");
if (temp.length >= 2)
buildNumber = temp[1].trim();
}
Optional<String> hasSCMRevision = streamSupplier.get().filter(s -> s.contains(LogProps.SCM_REVISION_50)).findFirst();
if (hasSCMRevision.isPresent())
{
scmRevision = hasSCMRevision.get();
String[] temp = scmRevision.split(":");
if (temp.length >= 4)
scmRevision = temp[3].trim();
}
Optional<String> hasBuildTag = streamSupplier.get().filter(s -> s.contains(LogProps.BUILD_TAG_50)).findFirst();
if (hasBuildTag.isPresent())
{
buildTag = hasBuildTag.get();
String[] temp = buildTag.split(":");
if (temp.length >= 4)
buildTag = temp[3].trim();
}
Optional<String> hasTimestamp = streamSupplier.get().filter(s -> s.contains(LogProps.BUILD_TIMESTAMP_50)).findFirst();
if (hasTimestamp.isPresent())
{
timestamp = hasTimestamp.get();
String[] temp = timestamp.split(":");
if (temp.length >= 4)
timestamp = temp[3].trim();
}
}
Now the problem is, if I call the first time
Optional<String> hasBuildNumber = streamSupplier.get().filter(s -> s.contains(LogProps.PLM_BUILD)).findFirst();
it is working properly, but if I call the next
Optional<String> hasSCMRevision = streamSupplier.get().filter(s -> s.contains(LogProps.SCM_REVISION_50)).findFirst();
I get the following exception:
Exception in thread "Thread-21" java.lang.IllegalStateException: stream has already been operated upon or closed
at java.util.stream.AbstractPipeline.<init>(AbstractPipeline.java:203)
at java.util.stream.ReferencePipeline.<init>(ReferencePipeline.java:94)
at java.util.stream.ReferencePipeline$StatelessOp.<init>(ReferencePipeline.java:618)
at java.util.stream.ReferencePipeline$2.<init>(ReferencePipeline.java:163)
at java.util.stream.ReferencePipeline.filter(ReferencePipeline.java:162)
at com.dscsag.dscxps.model.analysis.Analysis.getECTRBuildInformation(Analysis.java:205)
at com.dscsag.dscxps.model.analysis.Analysis.parseLogFile(Analysis.java:153)
at com.dscsag.dscxps.model.analysis.Analysis.analyze(Analysis.java:135)
at com.dscsag.dscxps.model.XPSModel.lambda$startAnalysis$0(XPSModel.java:467)
at com.dscsag.dscxps.model.XPSModel$$Lambda$1/12538894.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)
Since I read this page http://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/ I think it should be working, cause the supplier provides new streams on get().

If you re-write your supplier as an anonymous pre-java 8 class. That would be equivalent to:
Supplier<Stream<String>> streamSupplier = new Supplier<Stream<String>>() {
#Override
public Stream<String> get() {
return lines;
}
};
Maybe here it becomes more obvious that you are returning the same stream instance each time you call get on your supplier (and hence the exception thrown on the second call because findFirst is a short-circuiting terminal operation). You are not returning a brand new Stream.
In the webpage example you gave, the writer uses Stream.of which create a brand new Stream each time get is called, that's why it works.
AFAIK there is no way to duplicate a Stream from an existing one. So one workaround would be to pass the object from which the Stream comes in and then get the Stream in the supplier.
public class Test {
public static void main(String[] args) {
getBuildInformation(Arrays.asList("TEST", "test"));
}
private static void getBuildInformation(List<String> lines) {
Supplier<Stream<String>> streamSupplier = () -> lines.stream();
Optional<String> hasBuildNumber = streamSupplier.get().filter(s -> s.contains("t")).findFirst();
System.out.println(hasBuildNumber);
Optional<String> hasSCMRevision = streamSupplier.get().filter(s -> s.contains("T")).findFirst();
System.out.println(hasSCMRevision);
}
}
Which output:
Optional[test]
Optional[TEST]
Since you get the lines from a Path object, handling the exception in the Supplier itself can come quite ugly so what you can do is to create an helper method that will handle the Exception to be catched, then it would be like this:
private static void getBuildInformation(Path path) {
Supplier<Stream<String>> streamSupplier = () -> lines(path);
//do your stuff
}
private static Stream<String> lines(Path path) {
try {
return Files.lines(path);
}
catch (IOException e) {
throw new UncheckedIOException(e);
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

process data csv with Java 8 and streams - java

Related

How to delete all duplicates files in a directory?? Java

How can I read all files line by line from a directory by using Streams?

Convert set to json array using java 8

Java 8 streams, lambdas

IllegalStateException with StreamSupplier

Categories

Resources