I encounter a difficult problem. I am looking for a suggestion how to approach in this problem. I have three field in my dataset. I want to perform a subtraction.The problem is like that.
Time(s) a x
1 0.1 0.2
2 0.4
3 0.6
4 0.7
5 0.2 0.9
I need to perform a subtraction from (a-x). But the method of subtraction is like that at time 1s a has value 0.1. The operation will be (0.1-0.2) 1st iteration. 2nd iteration (0.1-0.4). 3rd iteration (0.1-0.6).4th iteration (0.1-0.7) But in 2nd iteration it will be (0.2-0.9).
This is my problem statement. I want to write down this code in Java. I don't need Java code. I can write it down myself. I need a suggestion how to proceed in this approach?. One thought is that creating array for each variable. But then stuck on loop. How the loop iterated? It is clear array a is static until it get next value, which is available at Time 5s.
This will depend on how large is your input file:
If the dataset fits into memory load it as either 2 separate array or as one array of Row objects with a and x as fields. After that it's simple iteration remembering what was the last row that contained a to use it when a is missing.
If the dataset is large it's better to read it using BufferedReader and only remember the last encountered a and x. This will greatly reduce the memory consumption and would be the preferred approach.
If a changes every 4 numbers you can use time's / 4 + 1 to get value from small array of a.
If a changes not every 4 numbers, then I suggest to use full array filled with same values.
Now that I see you're not using a database and just reading from a file, maybe try this
Just keep the old value of a until a new value can overwrite it.
This is memory efficient since it parses line by line.
public static List<Double> parseFile(String myFile) throws IOException {
List<Double> results = new ArrayList<>();
try (BufferedReader b = new BufferedReader(new FileReader(myFile));) {
b.readLine(); // ** skip header?
String line;
Integer time = null;
Double a = null;
Double x = null;
for (int lineNum = 0; (line = b.readLine()) != null; lineNum++) {
// ** split the data on any-and-all-whitespace
final String[] data = line.split("\\s+");
if (data.length != 3)
throw new RuntimeException("Invalid data format on line " + lineNum);
try {
time = Integer.valueOf(data[0]);
if (!data[1].trim().isEmpty()) {
a = Double.valueOf(data[1]);
}
if (!data[2].trim().isEmpty()) {
x = Double.valueOf(data[2]);
}
} catch (Exception e) {
throw new RuntimeException("Couldn't parse line " + lineNum, e);
}
if (a == null || x == null) {
throw new RuntimeException("Values not initialized at line " + lineNum);
}
results.add(Double.valueOf(a.doubleValue() - x.doubleValue()));
}
}
// ** finished parsing file, return results
return results;
}
Related
I made this program in java, on the BlueJ IDE. It is meant to take a number in the decimal base and convert it into a base of the users choice, up till base 9. It does this by taking the modulus between two numbers and inserting it into a string. The code works till the input stage, after which there is no output. I am sure my maths is right, but the syntax may have a problem.
My code is as follows:
import java.util.*;
public class Octal
{
public static void main(String[] args)
{
Scanner in = new Scanner(System.in);
int danum = 0;
int base = 0;
System.out.println("Please enter the base you want the number in (till decimal). Enter as a whole number");
base=in.nextInt(); //This is the base the user wants the number converted in//
System.out.println("Enter the number you want converted (enter in decimal)");
danum=in.nextInt(); //This is the number the user wants converted//
while ( danum/base >= base-1 && base < danum) {
int rem = danum/base; //The number by the base//
int modu = danum % base;//the modulus//
String summat = Integer.toString(modu);//this is to convert the integer to the string//
String strConverted = new String();//Making a new string??//
StringBuffer buff = new StringBuffer(strConverted);//StringBuffer command//
buff.insert(0, summat); //inserting the modulus into the first position (0 index)//
danum = rem;
if ( rem <= base-1 || base>danum) {//does the || work guys?//
System.out.println(rem + strConverted);
}
else {
System.out.println(strConverted);
}
}
}
}
I am very new to Java, so I am not fully aware of the syntax. I have done my best to research so that I don't waste your time. Please give me suggestions on how to improve my code and my skill as a programmer. Thanks.
Edit (previous answer what obviously a too quick response...)
String summat = Integer.toString(modu);
String strConverted = new String();
StringBuffer buff = new StringBuffer(strConverted);
buff.insert(0, summat);
...
System.out.println(strConverted);
Actually, strConverted is still an empty string, maybe you would rather than display buff.toString()
But I don't really understand why making all of this to just display the value of modu. You could just right System.out.println(modu).
I assume that you want to "save" your value and display your whole number in one time and not each digit a time by line.
So you need to store your number outside of while loop else your string would be init at each call of the loop. (and print outside)
So, init your StringBuffer outside of the loop. you don't need to convert your int to String since StringBuffer accept int
http://docs.oracle.com/javase/8/docs/api/java/lang/StringBuffer.html#insert-int-int-
(You could even use StringBuilder instead of StringBuffer. It work the same except StringBuffer work synchronized
https://docs.oracle.com/javase/8/docs/api/java/lang/StringBuilder.html)
Your if inside the loop is a specific case (number lower than base) is prevent before the loop since it's the opposite condition of your loop. (BTW : rem <= base-1 and base>danum are actually only one test since rem == danum at this place)
so :
StringBuffer buff = new StringBuffer();
if(base > danum) {
buff.append(danum);
} else {
while (danum / base >= base - 1 && base < danum) {
int rem = danum / base;
int modu = danum % base;
buff.insert(0, modu);
danum = rem;
}
if(danum > 0) {
buff.insert(0, danum);
}
}
System.out.println(buff.toString());
I would also strongly recommand to test your input before running your code. (No Zero for base, no letters etc...)
2 Things
do a lot more error checking after getting user input. It avoids weird 'errors' down the path
Your conversion from int to String inside the loop is wrong. Whats the whole deal summat and buff.... :: modifying the buffer doesnt affect the strConverted (so thats always empty which is what you see)
try to get rid of this. :)
error is logic related
error is java related
Your code has the following problems:
Firstly, you have declared and initialized your strConverted variable (in which you store your result) inside your while loop. Hence whenever the loop repeats, it creates a new string strConverted with a value "". Hence your answer will never be correct.
Secondly, the StringBuffer buff never changes the string strConverted. You have to change your string by actually calling it.
You print your result inside your while loop which prints your step-by-step result after every repetition. You must change the value of strConverted within the loop, nut the end result has to be printed outside it.
I'm trying to solve a calculation problem in Java.
Suppose my data looks as follows:
466,2.0762
468,2.0799
470,2.083
472,2.0863
474,2.09
476,2.0939
478,2.098
It's a list of ordered pairs, in the form of [int],[double]. Each line in my file contains one pair. The file can contain seven to seven thousand of those lines, all of them formatted as plain text.
Each [int] must be subtracted from the [int] one line above and the result written onto another file. The same calculation must be done for every [double]. For example, in the data reported above, the calculation should be:
478-476 -> result to file
476-474 -> result to file
(...)
2.098-2.0939 -> result to file
2.0939-2.09 -> result to file
and so on.
I beg your pardon if this question will look trivial for the vast majority of you, but after weeks trying to solve it, I got nowhere. I also had troubles finding something even remotely similar on this board!
Any help will be appreciated.
Thanks!
Read the file
Build the result
Write to a file
For the 1. task there are already several good answers here, for example try this one: Reading a plain text file in Java.
You see, we are able to read a file line per line. You may build a List<String> by that which contains the lines of your file.
To the 2. task. Let's iterate through all lines and build the result, again a List<String>.
List<String> inputLines = ...
List<String> outputLines = new LinkedList<String>();
int lastInt = 0;
int lastDouble = 0;
boolean firstValue = true;
for (String line : inputLines) {
// Split by ",", then values[0] is the integer and values[1] the double
String[] values = line.split(",");
int currentInt = Integer.parseInt(values[0]);
double currentDouble = Double.parseDouble(values[1]);
if (firstValue) {
// Nothing to compare to on the first run
firstValue = false;
} else {
// Compare to last values and build the result
int diffInt = lastInt - currentInt;
double diffDouble = lastDouble - currentDouble;
String outputLine = diffInt + "," + diffDouble;
outputLines.add(outputLine);
}
// Current values become last values
lastInt = currentInt;
lastDouble = currentDouble;
}
For the 3. task there are again good solutions on SO. You need to iterate through outputLines and save each line in a file: How to create a file and write to a file in Java?
My regular procedure when coming to the task on getting dimensions of a csv file as following:
Get how many rows it has:
I use a while loop to read every lines and count up through each successful read. The cons is that it takes time to read the whole file just to count how many rows it has.
then get how many columns it has:
I use String[] temp = lineOfText.split(","); and then take the size of temp.
Is there any smarter method? Like:
file1 = read.csv;
xDimention = file1.xDimention;
yDimention = file1.yDimention;
I guess it depends on how regular the structure is, and whether you need an exact answer or not.
I could imagine looking at the first few rows (or randomly skipping through the file), and then dividing the file size by average row size to determine a rough row count.
If you control how these files get written, you could potentially tag them or add a metadata file next to them containing row counts.
Strictly speaking, the way you're splitting the line doesn't cover all possible cases. "hello, world", 4, 5 should read as having 3 columns, not 4.
Your approach won't work with multi-line values (you'll get an invalid number of rows) and quoted values that might happen to contain the deliminter (you'll get an invalid number of columns).
You should use a CSV parser such as the one provided by univocity-parsers.
Using the uniVocity CSV parser, that fastest way to determine the dimensions would be with the following code. It parses a 150MB file to give its dimensions in 1.2 seconds:
// Let's create our own RowProcessor to analyze the rows
static class CsvDimension extends AbstractRowProcessor {
int lastColumn = -1;
long rowCount = 0;
#Override
public void rowProcessed(String[] row, ParsingContext context) {
rowCount++;
if (lastColumn < row.length) {
lastColumn = row.length;
}
}
}
public static void main(String... args) throws FileNotFoundException {
// let's measure the time roughly
long start = System.currentTimeMillis();
//Creates an instance of our own custom RowProcessor, defined above.
CsvDimension myDimensionProcessor = new CsvDimension();
CsvParserSettings settings = new CsvParserSettings();
//This tells the parser that no row should have more than 2,000,000 columns
settings.setMaxColumns(2000000);
//Here you can select the column indexes you are interested in reading.
//The parser will return values for the columns you selected, in the order you defined
//By selecting no indexes here, no String objects will be created
settings.selectIndexes(/*nothing here*/);
//When you select indexes, the columns are reordered so they come in the order you defined.
//By disabling column reordering, you will get the original row, with nulls in the columns you didn't select
settings.setColumnReorderingEnabled(false);
//We instruct the parser to send all rows parsed to your custom RowProcessor.
settings.setRowProcessor(myDimensionProcessor);
//Finally, we create a parser
CsvParser parser = new CsvParser(settings);
//And parse! All rows are sent to your custom RowProcessor (CsvDimension)
//I'm using a 150MB CSV file with 1.3 million rows.
parser.parse(new FileReader(new File("c:/tmp/worldcitiespop.txt")));
//Nothing else to do. The parser closes the input and does everything for you safely. Let's just get the results:
System.out.println("Columns: " + myDimensionProcessor.lastColumn);
System.out.println("Rows: " + myDimensionProcessor.rowCount);
System.out.println("Time taken: " + (System.currentTimeMillis() - start) + " ms");
}
The output will be:
Columns: 7
Rows: 3173959
Time taken: 1279 ms
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
IMO, What you are doing is an acceptable way to do it. But here are some ways you could make it faster:
Rather than reading lines, which creates a new String Object for each line, just use String.indexOf to find the bounds of your lines
Rather than using line.split, again use indexOf to count the number of commas
Multithreading
I guess here are the options which will depend on how you use the data:
Store dimensions of your csv file when writing the file (in the first row or as in an additional file)
Use a more efficient way to count lines - maybe http://docs.oracle.com/javase/6/docs/api/java/io/LineNumberReader.html
Instead of creating an arrays of fixed size (assuming thats what you need the line count for) use array lists - this may or may not be more efficient depending on size of file.
To find number of rows you have to read the whole file. There is nothing you can do here. However your method of finding number of cols is a bit inefficient. Instead of split just count how many times "," appeard in the line. You might also include here special condition about fields put in the quotas as mentioned by #Vlad.
String.split method creates an array of strings as a result and splits using regexp which is not very efficient.
I find this short but interesting solution here:
https://stackoverflow.com/a/5342096/4082824
LineNumberReader lnr = new LineNumberReader(new FileReader(new File("File1")));
lnr.skip(Long.MAX_VALUE);
System.out.println(lnr.getLineNumber() + 1); //Add 1 because line index starts at 0
lnr.close();
My solution is simply and correctly process CSV with multiline cells or quoted values.
for example We have csv-file:
1,"""2""","""111,222""","""234;222""","""""","1
2
3"
2,"""2""","""111,222""","""234;222""","""""","2
3"
3,"""5""","""1112""","""10;2""","""""","1
2"
And my solution snippet is:
import java.io.*;
public class CsvDimension {
public void parse(Reader reader) throws IOException {
long cells = 0;
int lines = 0;
int c;
boolean qouted = false;
while ((c = reader.read()) != -1) {
if (c == '"') {
qouted = !qouted;
}
if (!qouted) {
if (c == '\n') {
lines++;
cells++;
}
if (c == ',') {
cells++;
}
}
}
System.out.printf("lines : %d\n cells %d\n cols: %d\n", lines, cells, cells / lines);
reader.close();
}
public static void main(String args[]) throws IOException {
new CsvDimension().parse(new BufferedReader(new FileReader(new File("test.csv"))));
}
}
I have two tsv files to parse and extract values from each file. Each line may have 4-5 attributes per line. The content of both the files are as below :
1 44539 C T 19.44
1 44994 A G 4.62
1 45112 TATGG 0.92
2 43635 Z Q 0.87
3 5672 AAS 0.67
There are some records in each file that have first 3 or 4 attributes same but different value. I want to retain higher value of such records and prepare new file with all unique values. For example:
1 44539 C T 19.44
1 44539 C T 25.44
I need to retain one with the higher value in above case record with value 25.44
I have drafted code for this however after few minutes the program runs slow. I am reading each record from a file forming a key value pair with the first 3 or 4 records as key and last record as value and storing it in hashmap and use it to again write to a file. Is there a better solution?
also how can I test if my code is giving me correct output in file?
One file is of size 498 MB with 23822225 records and other is of 515 MB with 24500367 records.
I get Exception in thread "main" java.lang.OutOfMemoryError: Java heap space error for the file with size 515 MB.
Is there a better way I can code to execute the program efficiently with out increasing heap size.
I might have to deal with larger files in future, what would be the trick to solve such problems?
public class UniqueExtractor {
private int counter = 0;
public static void main(String... aArgs) throws IOException {
UniqueExtractor parser = new UniqueExtractor("/Users/xxx/Documents/xyz.txt");
long startTime = System.currentTimeMillis();
parser.processLineByLine();
parser.writeToFile();
long endTime = System.currentTimeMillis();
long total_time = endTime - startTime;
System.out.println("done in " + total_time/1000 + "seconds ");
}
public void writeToFile()
{
System.out.println("writing to a file");
try {
PrintWriter writer = new PrintWriter("/Users/xxx/Documents/xyz_unique.txt", "UTF-8");
Iterator it = map.entrySet().iterator();
StringBuilder sb = new StringBuilder();
while (it.hasNext()) {
sb.setLength(0);
Map.Entry pair = (Map.Entry)it.next();
sb.append(pair.getKey());
sb.append(pair.getValue());
writer.println(sb.toString());
writer.flush();
it.remove();
}
}
catch(Exception e)
{
e.printStackTrace();
}
}
public UniqueExtractor(String fileName)
{
fFilePath = fileName;
}
private HashMap<String, BigDecimal> map = new HashMap<String, BigDecimal>();
public final void processLineByLine() throws IOException {
try (Scanner scanner = new Scanner(new File(fFilePath))) {
while (scanner.hasNextLine())
{
//System.out.println("ha");
System.out.println(++counter);
processLine(scanner.nextLine());
}
}
}
protected void processLine(String aLine)
{
StringBuilder sb = new StringBuilder();
String[] split = aLine.split(" ");
BigDecimal bd = null;
BigDecimal bd1= null;
for (int i=0; i < split.length-1; i++)
{
//System.out.println(i);
//System.out.println();
sb.append(split[i]);
sb.append(" ");
}
bd= new BigDecimal((split[split.length-1]));
//System.out.print("key is" + sb.toString());
//System.out.println("value is "+ bd);
if (map.containsKey(sb.toString()))
{
bd1 = map.get(sb.toString());
int res = bd1.compareTo(bd);
if (res == -1)
{
System.out.println("replacing ...."+ sb.toString() + bd1 + " with " + bd);
map.put(sb.toString(), bd);
}
}
else
{
map.put(sb.toString(), bd);
}
sb.setLength(0);
}
private String fFilePath;
}
There are a couple main things you may want to consider to improve the performance of this program.
Avoid BigDecimal
While BigDecimal is very useful, it has a lot of overhead, both in speed and space requirements. According to your examples, you don't have very much precision to worry about, so I would recommend switching to plain floats or doubles. These would take a mere fraction of the space (so you could process larger files) and would probably be faster to work with.
Avoid StringBuilder
This is not a general rule, but applies in this case: you appear to be parsing and then rebuilding aLine in processLine. This is very expensive, and probably unnecessary. You could, instead, use aLine.lastIndexOf('\t') and aLine.substring to cut up the String with much less overhead.
These two should significantly improve the performance of your code, but don't address the overall algorithm.
Dataset splitting
You're trying to handle enough data that you might want to consider not keeping all of it in memory at once.
For example, you could split up your data set into multiple files based on the first field, run your program on each of the files, and then rejoin the files into one. You can do this with more than one field if you need more splitting. This requires less memory usage because the splitting program does not have to keep more than a single line in memory at once, and the latter programs only need to keep a chunk of the original data in memory at once, not the entire thing.
You may want to try the specific optimizations outlined above, and then see if you need more efficiency, in which case try to do dataset splitting.
I have solved in various ways a simple problem on CodeEval, which specification can be found here (only a few lines long).
I have made 3 working versions (one of them in Scala) and I don't understand the difference of performances for my last Java version which I expected to be the best time and memory-wise.
I also compared this to a code found on Github. Here are the performance stats returned by CodeEval :
. Version 1 is the version found on Github
. Version 2 is my Scala solution :
object Main extends App {
val p = Pattern.compile("\\d+")
scala.io.Source.fromFile(args(0)).getLines
.filter(!_.isEmpty)
.map(line => {
val dists = new TreeSet[Int]
val m = p.matcher(line)
while (m.find) dists += m.group.toInt
val list = dists.toList
list.zip(0 +: list).map { case (x,y) => x - y }.mkString(",")
})
.foreach(println)
}
. Version 3 is my Java solution which I expected to be the best :
public class Main {
public static void main(String[] args) throws IOException {
Pattern p = Pattern.compile("\\d+");
File file = new File(args[0]);
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null) {
Set<Integer> dists = new TreeSet<Integer>();
Matcher m = p.matcher(line);
while (m.find()) dists.add(Integer.parseInt(m.group()));
Iterator<Integer> it = dists.iterator();
int prev = 0;
StringBuilder sb = new StringBuilder();
while (it.hasNext()) {
int curr = it.next();
sb.append(curr - prev);
sb.append(it.hasNext() ? "," : "");
prev = curr;
}
System.out.println(sb);
}
br.close();
}
}
Version 4 is the same as version 3 except I don't use a StringBuilder to print the output and do like in version 1
Here is how I interpreted those results :
version 1 is too slow because of the too high number of System.out.print calls. Moreover, using split on very large lines (that's the case in the tests performed) uses a lot of memory.
version 2 seems slow too but it is mainly because of an "overhead" on running Scala code on CodeEval, even very efficient code run slowly on it
version 2 uses unnecessary memory to build a list from the set, which also takes some time but should not be too significant. Writing more efficient Scala would probably like writing it in Java so I preferred elegance to performance
version 3 should not use that much memory in my opinion. The use of a StringBuilder has the same impact on memory as calling mkString in version 2
version 4 proves the calls to System.out.println are slowering down the program
Does someone see an explanation to those results ?
I conducted some tests.
There is a baseline for every type of language. I code in java and javascript. For javascript here are my test results:
Rev 1: Default empty boilerplate for JS with a message to standard output
Rev 2: Same without file reading
Rev 3: Just a message to the standard output
You can see that no matter what, there will be at least 200 ms runtime and about 5 megs of memory usage. This baseline depends on the load of the servers as well! There was a time when codeevals was heavily overloaded, thus making impossible to run anything within the max time(10s).
Check this out, a totally different challenge than the previous:
Rev4: My solution
Rev5: The same code submitted again now. Scored 8000 more ranking point. :D
Conclusion: I would not worry too much about CPU and memory usage and rank. It is clearly not reliable.
Your scala solution is slow, not because of "overhead on CodeEval", but because you are building an immutable TreeSet, adding elements to it one by one. Replacing it with something like
val regex = """\d+""".r // in the beginning, instead of your Pattern.compile
...
.map { line =>
val dists = regex.findAllIn(line).map(_.toInt).toIndexedSeq.sorted
...
Should shave about 30-40% off your execution time.
Same approach (build a list, then sort) will, probably, help your memory utilization in "version 3" (java sets are real memory hogs). It is also a good idea to give your list an initial size while you are at it (otherwise, it'll grow by 50% every time it runs out of capacity, which is wasteful in both memory and performance). 600 sounds like a good number, since that's the upper bound for the number of cities from the problem description.
Now, since we know the upper boundary, an even faster and slimmer approach is to do away with lists and boxed Integeres, and just do int dists[] = new int[600];.
If you wanted to get really fancy, you'd also make use of the "route length" range that's mentioned in the description. For example, instead of throwing ints into an array and sorting (or keeping a treeset), make an array of 20,000 bits (or even 20K bytes for speed), and set those that you see in input as you read it ... That would be both faster and more memory efficient than any of your solutions.
I tried solving this question and figured that you don't need the names of the cities, just the distances in a sorted array.
It has much better runtime of 738ms, and memory of 4513792 with this.
Although this may not help improve your piece of code, it seems like a better way to approach the question. Any suggestions to improve the code further are welcome.
import java.io.*;
import java.util.*;
public class Main {
public static void main (String[] args) throws IOException {
File file = new File(args[0]);
BufferedReader buffer = new BufferedReader(new FileReader(file));
String line;
while ((line = buffer.readLine()) != null) {
line = line.trim();
String out = new Main().getDistances(line);
System.out.println(out);
}
}
public String getDistances(String s){
//split the string
String[] arr = s.split(";");
//create an array to hold the distances as integers
int[] distances = new int[arr.length];
for(int i=0; i<arr.length; i++){
//find the index of , - get the characters after that - convert to integer - add to distances array
distances[i] = Integer.parseInt(arr[i].substring(arr[i].lastIndexOf(",")+1));
}
//sort the array
Arrays.sort(distances);
String output = "";
output += distances[0]; //append the distance to the closest city to the string
for(int i=0; i<arr.length-1; i++){
//get distance between current element(city) and next
int distance_between = distances[i+1] - distances[i];
//append the distance to the string
output += "," + distance_between;
}
return output;
}
}