Java 8 streams, lambdas - java

I am trying to learn how to utilize Java 8 features(such as lambdas and streams) in my daily programming, since it makes for much cleaner code.
Here's what I am currently working on:
I get a string stream from a local file with some data which I turn into objects later. The input file structure looks something like this:
Airport name; Country; Continent; some number;
And my code looks like this:
public class AirportConsumer implements AirportAPI {
List<Airport> airports = new ArrayList<Airport>();
#Override
public Stream<Airport> getAirports() {
Stream<String> stream = null;
try {
stream = Files.lines(Paths.get("resources/planes.txt"));
stream.forEach(line -> createAirport(line));
} catch (IOException e) {
e.printStackTrace();
}
return airports.stream();
}
public void createAirport(String line) {
String airport, country, continent;
int length;
airport = line.substring(0, line.indexOf(';')).trim();
line = line.replace(airport + ";", "");
country = line.substring(0,line.indexOf(';')).trim();
line = line.replace(country + ";", "");
continent = line.substring(0,line.indexOf(';')).trim();
line = line.replace(continent + ";", "");
length = Integer.parseInt(line.substring(0,line.indexOf(';')).trim());
airports.add(new Airport(airport, country, continent, length));
}
}
And in my main class I iterate over the object stream and print out the results:
public class Main {
public void toString(Airport t){
System.out.println(t.getName() + " " + t.getContinent());
}
public static void main(String[] args) throws IOException {
Main m = new Main();
m.whatever();
}
private void whatever() throws IOException {
AirportAPI k = new AirportConsumer();
Stream<Airport> s;
s = k.getAirports();
s.forEach(this::toString);
}
}
My question is this: How can I optimize this code, so I don't have to parse the lines from the file separately, but instead create a stream of objects Airport straight from the source file? Or is this the extent in which I can do this?

You need to use map() to transform the data as it comes past.
Files.lines(Paths.get("resources/planes.txt"))
.map(line -> createAirport(line));
This will return a Stream<Airport> - if you want to return a List, then you'll need to use the collect method at the end.
This approach is also stateless, which means you won't need the instance-level airports value.
You'll need to update your createAirport method to return something:
public Airport createAirport(String line) {
String airport = line.substring(0, line.indexOf(';')).trim();
line = line.replace(airport + ";", "");
String country = line.substring(0,line.indexOf(';')).trim();
line = line.replace(country + ";", "");
String continent = line.substring(0,line.indexOf(';')).trim();
line = line.replace(continent + ";", "");
int length = Integer.parseInt(line.substring(0,line.indexOf(';')).trim());
return new Airport(airport, country, continent, length);
}
If you're looking for a more functional approach to your code, you may want to consider a rewrite of createAirport so it doesn't mutate line. Builders are also nice for this kind of thing.
public Airport createAirport(final String line) {
final String[] fields = line.split(";");
return new Airport(fields[0].trim(),
fields[1].trim(),
fields[2].trim(),
Integer.parseInt(fields[3].trim()));
}
Throwing it all together, your class now looks like this.
public class AirportConsumer implements AirportAPI {
#Override
public Stream<Airport> getAirports() {
Stream<String> stream = null;
try {
stream = Files.lines(Paths.get("resources/planes.txt"))
.map(line -> createAirport(line));
} catch (IOException e) {
stream = Stream.empty();
e.printStackTrace();
}
return stream;
}
private Airport createAirport(final String line) {
final String[] fields = line.split(";");
return new Airport(fields[0].trim(),
fields[1].trim(),
fields[2].trim(),
Integer.parseInt(fields[3].trim()));
}
}

The code posted by Steve looks great. But there are still two places can be improved:
1, How to split a string.
2, It may cause issue if the people forget or don't know to close the stream created by calling getAirports() method. So it's better to finish the task(toList() or whatever) in place.
Here is code by abacus-common
try(Reader reader = IOUtil.createBufferedReader(file)) {
List<Airport> airportList = Stream.of(reader).map(line -> {
String[] strs = Splitter.with(";").trim(true).splitToArray(line);
return Airport(strs[0], strs[1], strs[2], Integer.valueOf(strs[3]));
}).toList();
} catch (IOException e) {
throw new RuntimeException(e);
}
// Or By the Try:
List<Airport> airportList = Try.stream(file).call(s -> s.map(line -> {
String[] strs = Splitter.with(";").trim(true).splitToArray(line);
return Airport(strs[0], strs[1], strs[2], Integer.valueOf(strs[3]));
}).toList())
Disclosure: I'm the developer of abacus-common.

Related

process data csv with Java 8 and streams

I'm learning java 8 and im trying to process a csv file in java;
List<Catalogo> catalogos = new ArrayList<>();
try (Stream<String> lines = Files.lines(Paths.get("src\\main\\resources\\productos.csv"), Charset.forName("Cp1252"))) {
List<String[]> data = lines.map(s -> s.split(","))
.collect(Collectors.toList());
createCatalog(catalogos, data);
catalogos.forEach(System.out::println);
} catch (IOException e) {
e.printStackTrace();
}
}
public static void createCatalog(List<Catalogo> catalogos, List<String[]> data) {
for (String[] x : data) {
for (int i = 0; i < x.length; i++) {
Catalogo catalogo = new Catalogo();
catalogo.setCodigo(x[0]);
catalogo.setProducto(x[1]);
catalogo.setTipo(x[2]);
catalogo.setPrecio(x[3]);
catalogo.setInventario(x[4]);
catalogos.add(catalogo);
}
}
}
I would like to know if it's possible to improve this code, I did not like it the way I have done it;
You can directly map to your object using a constructor that accepts all your attributes such as :
try...
List<Catalogo> catalogos = lines.map(s -> s.split(","))
.map(s -> new Catalogo(s[0], s[1], s[2], s[3], s[4]))
.collect(Collectors.toList());
catch...
where the constructor based on existing code would be of signature:
Catalogo(String codigo, String producto, String tipo, String precio, String inventario)

Java - Method for batch processing text files is much slower then the same action individually the same amount of times

I wrote a method processTrainDirectory which is supposed to import and process all the text files from a given directory. Individually processing the files takes about the same time for each file (90ms), but when I use the method for batch importing a given directory, the time per file increases incrementally (from 90ms to over 4000ms after 300 files). The batch importing method is as follows:
public void processTrainDirectory(String folderPath, Category category) {
File folder = new File(folderPath);
File[] listOfFiles = folder.listFiles();
if (listOfFiles != null) {
for (File file : listOfFiles) {
if (file.isFile()) {
processTrainText(file.getPath(), category);
}
}
}
else {
System.out.println(foo);
}
}
As I said, the method processTrainText is called per text file in the directory. This method takes incrementally longer when used inside processTrainDirectory. The method processTrainText is as follows:
public void processTrainText(String path, Category category){
trainTextAmount++;
Map<String, Integer> text = prepareText(path);
update(text, category);
}
I called processTrainText 200 times on 200 different texts manual and the time that this took was 200 * 90ms. But when I have a directory of 200 files and use processTrainDirectory it takes 90-92-96-104....3897-3940-4002ms which is WAY longer.
The problem persists when I call processTrainText a second time; it does not reset. Do you have any idea why this is or what the cause it, and how I can solve it?
Any help is greatly appreciated!
EDIT: somebody asked what other called methods did so here are all the used methods from my class BayesianClassifier all others are deleted for clarification, underneath you can find the class Category:
public class BayesianClassifier {
private Map<String, Integer> vocabulary;
private List<Category> categories;
private int trainTextAmount;
private int testTextAmount;
private GUI gui;
public Map<String, Integer> prepareText(String path) {
String text = readText(path);
String normalizedText = normalizeText(text);
String[] tokenizedText = tokenizeText(normalizedText);
return countText(tokenizedText);
}
public String readText(String path) {
BufferedReader br;
String result = "";
try {
br = new BufferedReader(new FileReader(path));
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append("\n");
line = br.readLine();
}
result = sb.toString();
br.close();
} catch (IOException e) {
e.printStackTrace();
}
return result;
}
public Map<String, Integer> countText(String[] words){
Map<String, Integer> result = new HashMap<>();
for(int i=0; i < words.length; i++){
if (!result.containsKey(words[i])){
result.put(words[i], 1);
}
else {
result.put(words[i], result.get(words[i]) + 1);
}
}
return result;
}
public void processTrainText(String path, Category category){
trainTextAmount++;
Map<String, Integer> text = prepareText(path);
update(text, category);
}
public void update(Map<String, Integer> text, Category category) {
category.addText();
for (Map.Entry<String, Integer> entry : text.entrySet()){
if(!vocabulary.containsKey(entry.getKey())){
vocabulary.put(entry.getKey(), entry.getValue());
category.updateFrequency(entry);
category.updateProbability(entry);
category.updatePrior();
}
else {
vocabulary.put(entry.getKey(), vocabulary.get(entry.getKey()) + entry.getValue());
category.updateFrequency(entry);
category.updateProbability(entry);
category.updatePrior();
}
for(Category cat : categories){
if (!cat.equals(category)){
cat.addWord(entry.getKey());
cat.updatePrior();
}
}
}
}
public void processTrainDirectory(String folderPath, Category category) {
File folder = new File(folderPath);
File[] listOfFiles = folder.listFiles();
if (listOfFiles != null) {
for (File file : listOfFiles) {
if (file.isFile()) {
processTrainText(file.getPath(), category);
}
}
}
else {
System.out.println(foo);
}
}
This is my Category class (all the methods that are not needed are deleted for clarification:
public class Category {
private String categoryName;
private double prior;
private Map<String, Integer> frequencies;
private Map<String, Double> probabilities;
private int textAmount;
private BayesianClassifier bc;
public Category(String categoryName, BayesianClassifier bc){
this.categoryName = categoryName;
this.bc = bc;
this.frequencies = new HashMap<>();
this.probabilities = new HashMap<>();
this.textAmount = 0;
this.prior = 0.00;
}
public void addWord(String word){
this.frequencies.put(word, 0);
this.probabilities.put(word, 0.0);
}
public void updateFrequency(Map.Entry<String, Integer> entry){
if(!this.frequencies.containsKey(entry.getKey())){
this.frequencies.put(entry.getKey(), entry.getValue());
}
else {
this.frequencies.put(entry.getKey(), this.frequencies.get(entry.getKey()) + entry.getValue());
}
}
public void updateProbability(Map.Entry<String, Integer> entry){
double chance = ((double) this.frequencies.get(entry.getKey()) + 1) / (sumFrequencies() + bc.getVocabulary().size());
this.probabilities.put(entry.getKey(), chance);
}
public Integer sumFrequencies(){
Integer sum = 0;
for (Integer integer : this.frequencies.values()) {
sum = sum + integer;
}
return sum;
}
}
It looks like the times per file are growing linearly and the total time quadratically. This means that with each file you're processing the data of all previous files. Indeed, you are:
updateProbability calls sumFrequencies, which runs through the entire frequencies, which grows with each file. That's the culprit. Simply create a field int sumFrequencies and update it in `updateFrequency.
As a further improvement, consider using Guava Multiset, which does the counting in a simpler and more efficient way (no autoboxing). After fixing your code, consider letting it be reviewed on CR; there are quite a few minor problems with it.
what is this method doing?
update(text, category);
If it is doing what may be a random call of me than this may be your bottleneck.
If you call it in a single way without additional context and it is updating some general data structure than yes it will always take the same time.
If it updates something that holds data from your past iterations than I am pretty sure it will take more and more time - check complexiy of update() method then and reduce your bottleneck.
Update:
Your method updateProbability is working on all the data you gathered so far when you are calculating sum of frequencies - thus taking more and more time the more files you process. This is your bottleneck.
There is no need of calculating it every time - just save it and update it every time something changes to minimize amount of calculation.

Append a string in front of line in java?

I am creating a pattern lock based project in android.
I have a file called category.txt
The content of the file is as below
Sports:Race:Arcade:
No what i want is that whenever the user draw a pattern for a specific games category the pattern should get append in front of that category.
eg :
Sports:Race:"string/pattern string to be appended here for race"Arcade:
i have used following code but it is not working.
private void writefile(String getpattern,String category)
{
String str1;
try {
file = new RandomAccessFile(filewrite, "rw");
while((str1 = file.readLine()) != null)
{
String line[] = str1.split(":");
if(line[0].toLowerCase().equals(category.toLowerCase()))
{
String colon=":";
file.write(category.getBytes());
file.write(colon.getBytes());
file.write(getpattern.getBytes());
file.close();
Toast.makeText(getActivity(),"In Writefile",Toast.LENGTH_LONG).show();
}
}
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch(IOException io)
{
io.printStackTrace();
}
}
please help !
Using RandomAccessFile you have to calculate the position. I think it's much easier to just replace the file content with a little help from apache-commons-io FileUtils. This might be not the best idea if you have a very large file but it's quite simple.
String givenCategory = "Sports";
String pattern = "stringToAppend";
final String colon = ":";
try {
List<String> lines = FileUtils.readLines(new File("someFile.txt"));
String modifiedLine = null;
int index = 0;
for (String line : lines) {
String[] categoryFromLine = line.split(colon);
if (givenCategory.equalsIgnoreCase(categoryFromLine[0])) {
modifiedLine = new StringBuilder().append(pattern).append(colon).append(givenCategory).append(colon).toString();
break;
}
index++;
}
if (modifiedLine != null) {
lines.set(index, modifiedLine);
FileUtils.writeLines(new File("someFile.txt"), lines);
}
} catch (IOException e1) {
// do something
}

IllegalStateException with StreamSupplier

I have the following code to do different things in one stream.
private void getBuildInformation(Stream<String> lines)
{
Supplier<Stream<String>> streamSupplier = () -> lines;
String buildNumber = null;
String scmRevision = null;
String timestamp = null;
String buildTag = null;
Optional<String> hasBuildNumber = streamSupplier.get().filter(s -> s.contains(LogProps.PLM_BUILD)).findFirst();
if (hasBuildNumber.isPresent())
{
buildNumber = hasBuildNumber.get();
String[] temp = buildNumber.split("=");
if (temp.length >= 2)
buildNumber = temp[1].trim();
}
Optional<String> hasSCMRevision = streamSupplier.get().filter(s -> s.contains(LogProps.SCM_REVISION_50)).findFirst();
if (hasSCMRevision.isPresent())
{
scmRevision = hasSCMRevision.get();
String[] temp = scmRevision.split(":");
if (temp.length >= 4)
scmRevision = temp[3].trim();
}
Optional<String> hasBuildTag = streamSupplier.get().filter(s -> s.contains(LogProps.BUILD_TAG_50)).findFirst();
if (hasBuildTag.isPresent())
{
buildTag = hasBuildTag.get();
String[] temp = buildTag.split(":");
if (temp.length >= 4)
buildTag = temp[3].trim();
}
Optional<String> hasTimestamp = streamSupplier.get().filter(s -> s.contains(LogProps.BUILD_TIMESTAMP_50)).findFirst();
if (hasTimestamp.isPresent())
{
timestamp = hasTimestamp.get();
String[] temp = timestamp.split(":");
if (temp.length >= 4)
timestamp = temp[3].trim();
}
}
Now the problem is, if I call the first time
Optional<String> hasBuildNumber = streamSupplier.get().filter(s -> s.contains(LogProps.PLM_BUILD)).findFirst();
it is working properly, but if I call the next
Optional<String> hasSCMRevision = streamSupplier.get().filter(s -> s.contains(LogProps.SCM_REVISION_50)).findFirst();
I get the following exception:
Exception in thread "Thread-21" java.lang.IllegalStateException: stream has already been operated upon or closed
at java.util.stream.AbstractPipeline.<init>(AbstractPipeline.java:203)
at java.util.stream.ReferencePipeline.<init>(ReferencePipeline.java:94)
at java.util.stream.ReferencePipeline$StatelessOp.<init>(ReferencePipeline.java:618)
at java.util.stream.ReferencePipeline$2.<init>(ReferencePipeline.java:163)
at java.util.stream.ReferencePipeline.filter(ReferencePipeline.java:162)
at com.dscsag.dscxps.model.analysis.Analysis.getECTRBuildInformation(Analysis.java:205)
at com.dscsag.dscxps.model.analysis.Analysis.parseLogFile(Analysis.java:153)
at com.dscsag.dscxps.model.analysis.Analysis.analyze(Analysis.java:135)
at com.dscsag.dscxps.model.XPSModel.lambda$startAnalysis$0(XPSModel.java:467)
at com.dscsag.dscxps.model.XPSModel$$Lambda$1/12538894.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)
Since I read this page http://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/ I think it should be working, cause the supplier provides new streams on get().
If you re-write your supplier as an anonymous pre-java 8 class. That would be equivalent to:
Supplier<Stream<String>> streamSupplier = new Supplier<Stream<String>>() {
#Override
public Stream<String> get() {
return lines;
}
};
Maybe here it becomes more obvious that you are returning the same stream instance each time you call get on your supplier (and hence the exception thrown on the second call because findFirst is a short-circuiting terminal operation). You are not returning a brand new Stream.
In the webpage example you gave, the writer uses Stream.of which create a brand new Stream each time get is called, that's why it works.
AFAIK there is no way to duplicate a Stream from an existing one. So one workaround would be to pass the object from which the Stream comes in and then get the Stream in the supplier.
public class Test {
public static void main(String[] args) {
getBuildInformation(Arrays.asList("TEST", "test"));
}
private static void getBuildInformation(List<String> lines) {
Supplier<Stream<String>> streamSupplier = () -> lines.stream();
Optional<String> hasBuildNumber = streamSupplier.get().filter(s -> s.contains("t")).findFirst();
System.out.println(hasBuildNumber);
Optional<String> hasSCMRevision = streamSupplier.get().filter(s -> s.contains("T")).findFirst();
System.out.println(hasSCMRevision);
}
}
Which output:
Optional[test]
Optional[TEST]
Since you get the lines from a Path object, handling the exception in the Supplier itself can come quite ugly so what you can do is to create an helper method that will handle the Exception to be catched, then it would be like this:
private static void getBuildInformation(Path path) {
Supplier<Stream<String>> streamSupplier = () -> lines(path);
//do your stuff
}
private static Stream<String> lines(Path path) {
try {
return Files.lines(path);
}
catch (IOException e) {
throw new UncheckedIOException(e);
}
}

Help with line.split CSV

I'm a beginner, so please don't blast my work so far :)
I'm trying to read in a CSV file and then see if it matches certain commands. Some of the data in the CSV has a period and I think it's messing up when I'm trying to split it. When I try to dump my arrays to see what's there, it always gets cut off after the period. Here is a data sample. Any help would be appreciated. Again I'm a beginner so simplicity would be appreciated.
Sample data
create,Mr. Jones,blah,blah
create,Mrs. Smith,blah,blah
public class TestHarness {
public static void main(String[] args) throws IOException, FileNotFoundException {
Scanner input = new Scanner(new File("C:\\Users\\Ree\\Desktop\\SPR\\commands.txt"));
String[] validCommands = { "create", "move", "useWeapon", "search", "heal" };
boolean proceed = false;
while (!proceed)
{
for (int i = 0; i < validCommands.length; i++)
{
String line = input.next();
String[] nline = line.split (",");
if (nline[0].equals("create"))
{
String soldierName = nline[1];
String soldierType = nline[2];
String weapon = nline[3];
Soldier aSoldier = new Soldier(soldierName,weapon);
System.out.println("Command: "+ nline[0] +","+ soldierName +","+ soldierType+","+ weapon);
if (soldierType.equals("enlisted"))
{
Soldier s = new Enlisted(soldierName,weapon);
System.out.println(s.toString());
}
else if (soldierType.equals("officer"))
{
Soldier s = new Officer(soldierName,weapon);
System.out.println(s.toString());
}
}
else if (nline[0].equals("useWeapon")) {
System.out.print("weapon " + nline[0] + "\n");
}
else if (nline[0].equals("move")) {
System.out.print("move " + nline[0] + "\n");
}
else if (nline[0].equals("search")) {
System.out.print("search " + nline[0] + "\n");
}
else if (nline[0].equals("heal")) {
System.out.print("heal " + nline[0] + "\n");
}
}
}
}
}
Calling Scanner.next will only return the next word (separated by whitespace).
You need to call nextLine to read entire lines at a time.
There are several open source CSV parers available for Java:
http://supercsv.sourceforge.net/
http://commons.apache.org/sandbox/csv/
That was a rather quick mistake, wasn't it?
This is not an answer to your question but a recommendation to use a hash.
First define an interface
public interface InputDance
{
public void dance(String[] input);
}
I recommend that your main routine should be
public static void main(String[] args)
throws IOException, FileNotFoundException{
Scanner input = new Scanner(
new File("C:\\Users\\Ree\\Desktop\\SPR\\commands.txt"));
String line = input.nextLine();
String[] nline = line.split (",");
InputDance inputxn = inputActions.get(nline[0]);
if (inputxn!=null)
inputxn.dance(nline);
}
You would be using a hash to store all the actions outlined by the interface InputDance.
So that your input reading routine would be simplified to
retrieve action from hash of actions
using word0 of input as key.
execute that action
If you have only five types of soldiers, it would ok to place all your logic in one routine.
However, for more than 10 types of personnel, it would be cleaner to place the actions outside the routine.
If you are writing a computer game, or keeping personnel records on a military database, you would frequently encounter enhancement requests to include new personnel types or exceptions to the rule. Then your if-then-else-if chain would become increasingly longer and confusing. Especially when there are special requirements for soldiers dancing to a different tune. Or when your game canvas or personnel database needs to include non-battle units. But, of course, you still need to update the hash in the main class every time you have a new personnel type.
Notice that in my recommendation all your routine would do is to perform the dance(String[]) method. Any complication would be handled by the individual class implementing the dance.
Next define an implementing class
public class SoldierDance
implements InputDance
{
public void dance(String[] nline){
String soldierName = nline[1];
String soldierType = nline[2];
String weapon = nline[3];
System.out.println(
"Command: "+ nline[0] +","+ soldierName +","+ soldierType+","+ weapon);
Soldier s;
if (soldierType.equals("enlisted")){
s = new Enlisted(soldierName,weapon);
}
else if (soldierType.equals("officer")){
s = new Officer(soldierName,weapon);
}
else{
s = new Soldier(soldierName,weapon);
}
System.out.println(s.toString());
}
}
Then define your main class. Notice that the hash is a static instance.
Also, there is a placeholder dance so that when you have a new personnel type, but you don't know what to do with it yet, you just hash the new personnel type to this placeholder dance.
Notice, for example in the "useWeapon" hash key, that an interface can be implemented anonymously too
public class TestHarness
{
static public class PlaceHolderDance
implements InputDance
{
public void dance(String[] nline){
System.out.print("Action=" + nline[0] + "\n");
}
}
static public Hashtable<String, InputDance> inputActions;
// A static enclosure is to execute all the class definition once.
static {
inputActions = new Hashtable<String, InputDance>();
InputDance placeHolderAction = new PlaceHolderDance();
inputActions.put("create", new SoldierDance());
inputActions.put("move", placeHolderAction);
inputActions.put("search", placeHolderAction);
inputActions.put("heal", placeHolderAction);
// Can also anonymously implement an interface
inputActions.put("useWeapon",
new InputDance(){
public void dance(String[] nline){
System.out.print("weapon " + nline[0] + "\n");
}
}
);
}
// The static main method
public static void main(String[] args)
throws IOException, FileNotFoundException{
Scanner input = new Scanner(
new File("C:\\Users\\Ree\\Desktop\\SPR\\commands.txt"));
String line = input.nextLine();
String[] nline = line.split (",");
InputDance inputxn = inputActions.get(nline[0]);
if (inputxn!=null)
inputxn.dance(nline);
}
}
And, if there is a one-one correspondence between a soldier class and its inputdance, you could even implement InputDance in the soldier class and provide the dance method.

Categories

Resources