On my computer programming class in college, we were asked to create a Java program to read and store various elements on a list.
Storing is not a problem, but I'm having some issues regarding the reading of the values. I'm using "variable = input.nextLine();" (if the variable is String), and the problem is that when I read several values in a row, sometimes the program will just skip past the reading one of them. My teacher recommended me to use input.nextLine() to fix that, but it's not working perfectly, and it seems like a workaround to me. I believe the problem is buffer-related. Is there something similar to C's fflush or fpurge functions on Java?
With a few exceptions, the input I/O classes don't have a flush method equivalent and shouldn't need one. The output I/O classes do have flush methods. You can see this at the JDK 6 Javadoc index for F and scrolling down the the collection of flush() methods.
Are you sure that you're not accidentally reading input.nextLine(); twice in a row and thus discarding one line?
We need to know more information before we can help solve your problem.
NOTE: Please edit your question to add additional information. Look for a small "edit" "button" below your question.
If you are reading from a file, you could use a BufferedReader:
BufferedReader reader = new BufferedReader(new FileReader("somelines_oh_yeah.txt"));
String line;
while((line = reader.readLine()) != null){
//do something.
}
You can use System.out.flush()
Related
I am trying to read a very large file (~2GB). Content is a continuous string with sentences (I would like to split them based on a '.'). No matter how I try, I end up with an Outofmemoryerror.
BufferedReader in = new BufferedReader(new FileReader("a.txt"));
String read = null;
int i = 0;
while((read = in.readLine())!=null) {
String[] splitted = read.split("\\.");
for (String part: splitted) {
i+=1;
users.add(new User(i,part));
repository.saveAll(users);
}
}
also,
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
// System.out.println(line);
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
Content of the file (composed of random words with a full stop after 10 words):
fmfbqi .xcdqnjqln kvjhw pexrbunnr cgvrqlr fpaczdegnb puqzjdbp gcfxne jawml aaiwwmo ugzoxn .opjc fmfbqi .xcdqnjqln kvjhw pexrbunnr cgvrqlr fpaczdegnb puqzjdbp gcfxne jawml aaiwwmo ugzoxn .opjc (so on)
Please help!
So first and foremost, based on comments on your question, as Joachim Sauer stated:
If there are no newlines, then there is only a single line and thus only one line number.
So your usecase is faulty, at best.
Let's move past that, and assume maybe there are new line characters - or better yet, assume that the . character you're splitting on is intended to be a newline psudeo-replacement.
Scanner is not a bad approach here, though there are others. Since you provided a Scanner, lets continue with that, but you want to make sure you're wrapping it around a BufferedReader. You clearly don't have a lot of memory, and a BufferedReader allows your to read 'chunks' of a file, as buffered by the BufferedReader, while utilizing the functionality of the Scanner completely obscure to you as a caller that the buffering is happening:
Scanner sc = new Scanner(new BufferedReader(new FileReader(new File("a.txt")), 10*1024));
What this is basically doing, is letting the Scanner function as you expect, but allowing you to buffer 10MB at a time, minimizing your memory footprint. Now, you just keep calling
sc.useDelimiter("\\.");
for(int i = 0; sc.hasNext(); i++) {
String psudeoLine = sc.next();
//store line 'i' in your database for this psudeo-line
//DO NOT store psudeoLine anywhere else - you don't have memory for it
}
Since you don't have enough memory, the clear thing to iterate (and re-iterate) is don't store any part of the file within your JVM's heapspace after reading it. Read it, use it how you need it, and allow it to be marked for JVM garbage collection. In your case, you mention you want to store the psudeo lines in a database, so you want to read the psudeo-line, store it in the database, and just discard it.
There are other things to point out here, such as configuring your JVM arguments, but I hesitate to even mention it because just setting your JVM memory high is a bad idea too - another brute force approach. There's nothing wrong with setting your JVM memory max heap size higher, but learning memory management is better if you're still learning how to write software. You'll get in less trouble later when you get into professional development.
Also, I mentioned Scanner and BufferedReader because you mentioned that in your question, but I think checking out java.nio.file.Path.lines() as pointed out by deHaar is also a good idea. This basically does the same thing as the code I've explicitly laid out, with the caveat that it still only does 1 line at a time without the ability to change what you're 'splitting' on. So if your text file has 1 single line in it, this will still cause you a problem and you will still need something like a scanner to fragment the line out.
This question already has answers here:
JAVA - Best approach to parse huge (extra large) JSON file
(3 answers)
OutOfMemory exception in a lot of memory
Closed 5 years ago.
This is to read a file faster not write it.
I have a 150MB file which has a JSON object inside it. I currently use the following code to read it:
String filename ="/tmp/fileToRead";
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(filename), Charset.forName("UTF-8")));
decompressedString = reader.readLine();
reader.close();
JSONObject obj = new JSONObject(decompressedString);
JSONArray profileData = obj.getJSONObject("profileData").getJSONArray("children");
....
It is a single line file and since it is JSON I can't split it ( or atleast I think so). Reading the file gives me a OutOfMemory Error or a TLE. The file takes more than 7 secs to be read and that results in the TLE since the execution of the whole code cannot go beyond 7 seconds. I get the OOM on decompressedString = reader.readLine();.
Is there a way I can reduce the memory used or the time it takes to be read completely?
You have several problems at hand:
You're preemptively parsing too much.
The error you get happens already when you read the line since you said "I get the OOM on decompressedString = reader.readLine();".
You should never try to read data line by line. BufferedReader.readLine() will block until you've read the character \r or \n or the sequence \r\n. When processing data of any length, you're never sure you'll get one of those characters. Also, you're never sure you'll get of those characters outside of the data itself. So your string may be too long or malformed. So don't ever pretend to know the format. BufferedReader.readLine() must be used when parsing, not when acquiring data.
You're not using an appropriate library for your use-case
Reading your JSON is important, yes, but you're reading too much at once. When creating your JSON, you might want to build it from a stream (one of InputStream, Reader or any nio's Channel/Buffer).
Currently you're making your JSON from a String. A huge one. So I can safely assume you're going to require at one point twice the memory you need. One time in the String and one time in the finalized object.
To reduce that, use an appropriate library to which you can pass one of the stream mentioned above. I mentioned in my comments the following: Gson, JSON.simple and Jackson.
Your file may be too big anyways.
If you get your data and you want to acquire only subset of it (here, you want everything under {"profileData":{"children": <DATA>}}). But you probably have way too much. How many elements exist at the same level as profileData? How many elements exist at the same level as children? Do you know? Probably way too much. All that is not under profileData.children is useless. What percentage of your total data is that? 50%? 90%? 99%?
To solve this, you probably want one of two things: you want less data or you want to be able to focus your request.
If you want less data, ask your data provider to give you less: only what you need. Why get more than that? It makes no sense. Tell him so and say "I want less".
If you want focused data, use a library that allows you to both parse and reduce the amount of data. You might want to have a library that lets you say this: "parse this JSON and return only the processingData.children element". Unfortunately I know no library that does it. If others do, please add a comment or answer. Apparently, Gson is able to do so if you use the JsonReader yourself and selectively use skipValue().
I am using java.io.PrintWriter to write some text to a text file.
I was wondering if it was better to build in a variable all what I need to write and give only once
PrintWriter out = new PrintWriter(outputfile);
out.printf("%s", myvariablewithalltext);
or if I can call n times PrintWriter to write block of text in a for loop.
It works in either way and there is no much more code, I was just wondering which is better.
In most cases it's better to write in stream. The main reason is that your variable might take too much memory, but stream will automatically flush it's content. Writing text into the variable is essentially manual buffering. And better way to do it is to use appropriate buffering stream/writer. In you case you can just use java.io.BufferedWriter. Like so
BufferedWriter out = new BufferedWriter(new PrintWriter("file.txt"));
or, if you prefer PrintWriter interface, you can do this
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("file.txt")));
Assuming you are open for other suggestions (not just the two you mentioned in question).
If all you want is a clean way of writing text to a file, which of course has multiple solutions, here are few ways:
Using PrintWriter.
example:
String contentToWrite = "This is some random Text";
PrintWriter writerToFile = new PrintWriter("TheOutputFile.txt");
writerToFile.print(contentToWrite);
writerToFile.close();
Using FileOutputStream
example:
String contentToWrite = "This is some random Text";
FileOutputSream fileOPS = new FileOutputStream("TheOutputFile.txt");
fileOPS.write(contentToWrite.getBytes());
fileOPS.close();
Using Files
Using FileWriter along with BufferWriter
Using FileUtils by apache.commons.io
Using Files by guava
Some approaches here just take the content (no parsing or conversion required i.e in string format) and write that to a file. [ no parsing/conversion -> less code -> cleaner code ]. ;)
Some do not require you to make nesting of objects. [ less objects -> less code -> cleaner code ]. ;)
Of course usage depends on your implementation. but I hope this will help you in making decision what would best suit your requirement.
Note: every class name I mentioned is a link to its reference document.
It is the latter. There is no good reason whatsoever to put the entire content into a variable, just to write it in a file.
If you have some additional use for that variable beyond writing to file, that might change things a little bit, but even then, there is, probably, a better way.
I think it depends on your content lenght.
If you have just some litle text, it's better to keep all in memory and write in one shot.
But if your content is very large or if some part take long time to computed, probably you should write piece by piece to avoid have huge data kept in memory.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Given a string I #eat# #hamburgers# and a StringBuilder eat: [eat, consume, like] hamburgers: [hamburgers, spinach, bananas], I want to randomly replace the words within hashmarks with randomly chosen ones from their wordbanks, so that phrases such as I like bananas and I consume spinach will be generated. Code to randomly select another word, given a token (i.e. eat, hamburgers) has been written.
I need to use this regex #[^#]+# to find words within the initial string contained by hashmarks, pass them to the replace method, and then put their random correlates back inside the initial string. I tried using StringTokenizer, but realized it's not the tool for the job.
I need to somehow extract the first word within hashmarks and pass it to the method calling for its replacement before calling the method archetypeString(#[^#]+#, replacement) in such a way so that when the loop runs again, both the word grabber&passer-to method and the replacement method are then working with the second hashed word.
tokenizer dead-end:
StringTokenizer stt = new StringTokenizer(archetype);
while(stt.hasMoreTokens()){
String temp = stt.nextToken();
if(temp.charAt(0)=='#');
}
and the getPhrase method:
public List<String> getPhrases(StringBuilder fileContent, String token) {
StreamTokenizer tokenizer = new StreamTokenizer(new StringReader(fileContent.toString()));
List<String> list = new ArrayList<String>();
try {
while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
if (tokenizer.sval.equals(token)) {
tokenizer.nextToken(); // '['
do {
tokenizer.nextToken(); // go to the number
list.add(String.valueOf(tokenizer.sval));
} while (tokenizer.nextToken() == ',');
break;
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return list;
}
I need to use this regex #[^#]+# to find words within the initial string contained by hashmarks, pass them to the replace method, and then put their random correlates back inside the initial string. I tried using StringTokenizer, but realized it's not the tool for the job.
It is not clear from your question whether this is part of some sadistic homework assignment or just the first way you thought of to solve whatever problem you're trying to solve. This is not a regular expression problem any more than it's a StringTokenizer problem.
Look at String.format(), and the formatting capabilities of Formatter. I do not understand why you would ever need to know what the last string you generated was if your object is to generate the next one at random. Just pick a new random value and format it with String.format().
--
After reading your comment to this answer and looking at the question you referred to, I'm going to make a couple of recommendations.
(1) start with a simpler coding assignment or two, something without regular expressions. Make sure you absolutely understand the following concepts: instance variables. variable scope. public methods versus private methods. passing parameters to methods, and returning values from methods. You can do quite a bit with just that much. You don't need to study inheritance until you have all of those down cold, and I recommend that you do not try.
(2) for each coding assignment for at least your first 5, make sure you have written out what your program is to be provided as data and what output it is supposed to produce. List any constraints someone has given you separately (must use class X, must display error message, whatever).
(3) Put opening braces and closing braces on lines by themselves; match each opening brace with a closing brace indented the same amount. Indent code within each pair of braces another 2 or 3 spaces further to the right. This means that brace pairs inside other brace pairs will be indented further. I know this is not the way you see most code, and plenty of people will tell you that it is "wrong". But until you get comfortable with scope and whether a given place in your code is inside or outside a method or a loop, I think it best that you give yourself these extra visual cues. For someone not familiar with other ways of doing things, this is easiest.
(4) be careful of your terms when posting here. In the other question you refer to, you say it is about inheritance, but it uses "implements", indicating that it is implementing an interface, not inheriting from a class. It is confusing to those of us trying to help you if you get the terminology wrong.
(5) when you post here: post the entire program (these early assignments should all be under 100 lines total, no reason not to post all of it). Make sure it is properly indented; use spaces instead of tabs. In text, and maybe also in comments, point out the place in the code where you seem to have the problem (if you know). If there is an error message, post the entire error message (don't tell us what it is, and don't try to interpret it for us). Work on your code until you have a specific question: why do I get a compile error here? Why do I get (or fail to get) this output? The program outputs X but I expected Y, why is that? etc.
We're not a tutorial shop; most of us need instruction to learn to program, and you need to get most of that somewhere besides here. We are willing to help with your questions, given that your questions are specific and reasonable and you aren't expecting us to provide the instruction. By itself, "I'm lost and need help" is a bit beyond StackOverflow's normal way of operating.
I am writing a utility in Java that reads a stream which may contain both text and binary data. I want to avoid having I/O wait. To do that I create a thread to keep reading the data (and wait for it) putting it into a buffer, so the clients can check avialability and terminate the waiting whenever they want (by closing the input stream which will generate IOException and stop waiting). This works every well as far as reading bytes out of it; as binary is concerned.
Now, I also want to make it easy for the client to read line out of it like '.hasNextLine()' and '.readLine()'. Without using an I/O-wait stream like buffered stream, (Q1) How can I check if a binary (byte[]) contain a valid unicode line (in the form of the length of the first line)? I look around the String/CharSet API but could not find it (or I miss it?). (NOTE: If possible I don't want to use non-build-in library).
Since I could not find one, I try to create one. Without being so complicated, here is my algorithm.
1). I look from the start of the byte array until I find '\n' or '\r' without '\n'.
2). Then, I cut the byte array from the start to that point and using it to create a string (with CharSet if specified) using 'new String(byte[])' or 'new String(byte[], CharSet)'.
3). If that success without exception, we found the first valid line and return it.
4). Otherwise, these bytes may not be a string, so I look further to another '\n' or '\r' w/o '\n'. and this process repeat.
5. If the search ends at the end of available bytes I stop and return null (no valid line found).
My question is (Q2)Is the following algorithm adequate?
Just when I was about to implement it, I searched on Google and found that there are many other codes for new line, for example U+2424, U+0085, U+000C, U+2028 and U+2029.
So my last question is (Q3), Do I really need to detect these code? If I do, Will it increase the chance of false alarm?
I am well aware that recognize something from binary is not absolute. I am just trying to find the best balance.
To sum up, I have an array of byte and I want to extract a first valid string line from it with/without specific CharSet. This must be done in Java and avoid using any non-build-in library.
Thanks you all in advance.
I am afraid your problem is not well-defined. You write that you want to extract the "first valid string line" from your data. But whether somet byte sequence is a "valid string" depends on the encoding. So you must decide which encoding(s) you want to use in testing.
Sensible choices would be:
the platform default encoding (Java property "file.encoding")
UTF-8 (as it is most common)
a list of encodings you know your clients will use (such as several Russian or Chinese encodings)
What makes sense will depend on the data, there's no general answer.
Once you have your encodings, the problem of line termination should follow, as most encodings have rules on what terminates a line. In ASCII or Latin-1, LF,CR-LF and LF-CR would suffice. On Unicode, you need all the ones you listed above.
But again, there's no general answer, as new line codes are not strictly regulated. Again, it would depend on your data.
First of all let me ask you a question, is the data you are trying to process a legacy data? In other words, are you responsible for the input stream format that you are trying to consume here?
If you are indeed controlling the input format, then you probably want to take a decision Binary vs. Text out of the Q1 algorithm. For me this algorithm has one troubling part.
`4). Otherwise, these bytes may not be a string, so I look further to
another '\n' or '\r' w/o '\n'. and this process repeat.`
Are you dismissing input prior to line terminator and take the bytes that start immediately after, or try to reevaluate the string with now 2 line terminators? If former, you may have broken binary data interface, if latter you may still not parse the text correctly.
I think having well defined markers for binary data and text data in your stream will simplify your algorithm a lot.
Couple of words on String constructor. new String(byte[], CharSet) will not generate any exception if the byte array is not in particular CharSet, instead it will create a string full of question marks ( probably not what you want ). If you want to generate an exception you should use CharsetDecoder.
Also note that in Java 6 there are 2 constructors that take charset
String(byte[] bytes, String charsetName) and String(byte[] bytes, Charset charset). I did some simple performance test a while ago, and constructor with String charsetName is magnitudes faster than the one that takes Charset object ( Question to Sun: bug, feature? ).
I would try this:
make the IO reader put strings/lines into a thread safe collection (for example some implementation of BlockingQueue)
the main code has only reference to the synced collection and checks for new data when needed, like queue.peek(). It doesn't need to know about the io thread nor the stream.
Some pseudo java code (missing exception & io handling, generics, imports++) :
class IORunner extends Thread {
IORunner(InputStream in, BlockingQueue outputQueue) {
this.reader = new BufferedReader(new InputStreamReader(in, "utf-8"));
this.outputQueue = outputQueue;
}
public void run() {
String line;
while((line=reader.readLine())!=null)
this.outputQueue.put(line);
}
}
class Main {
public static void main(String args[]) {
...
BlockingQueue dataQueue = new LinkedBlockingQueue();
new IORunner(myStreamFromSomewhere, dataQueue).start();
while(true) {
if(!dataQueue.isEmpty()) { // can also use .peek() != null
System.out.println(dataQueue.take());
}
Thread.sleep(1000);
}
}
}
The collection decouples the input(stream) more from the main code. You can also limit the number of lines stored/mem used by creating the queue with a limited capacity (see blockingqueue doc).
The BufferedReader handles the checking of new lines for you :) The InputStreamReader handles the charset (recommend setting one yourself since the default one changes depending on OS etc.).
The java.text namespace is designed for this sort of natural language operation. The BreakIterator.getLineInstance() static method returns an iterator that detects line breaks. You do need to know the locale and encoding for best results, though.
Q2: The method you use seems reasonable enough to work.
Q1: Can't think of something better than the algorithm that you are using
Q3: I believe it will be enough to test for \r and \n. The others are too exotic for usual text files.
I just solved this to get test stubb working for Datagram - I did byte[] varName= String.getBytes(); then final int len = varName.length; then send the int as DataOutputStream and then the byte array and just do readInt() on the rcv then read bytes(count) using the readInt.
Not a lib, not hard to do either. Just read up on readUTF and do what they did for the bytes.
The string should construct from the byte array recovered that way, if not you have other problems. If the string can be reconstructed, it can be buffered ... no?
May be able to just use read / write UTF() in DataStream - why not?
{ edit: per OP's request }
//Sending end
String data = new String("fdsfjal;sajssaafe8e88e88aa");// fingers pounding keyboard
DataOutputStream dataOutputStream = new DataOutputStream();//
final Integer length = new Integer(data.length());
dataOutputStream.writeInt(length.intValue());//
dataOutputStream.write(data.getBytes());//
dataOutputStream.flush();//
dataOutputStream.close();//
// rcv end
DataInputStream dataInputStream = new DataInputStream(source);
final int sizeToRead = dataInputStream.readInt();
byte[] datasink = new byte[sizeToRead.intValue()];
dataInputStream.read(datasink,sizeToRead);
dataInputStream.close;
try
{
// constructor
// String(byte[] bytes, int offset, int length)
final String result = new String(datasink,0x00000000,sizeToRead);//
// continue coding here
Do me a favor, keep the heat off of me. This is very fast right in the posting tool - code probably contains substantial errors - it's faster for me just to explain it writing Java ~ there will be others who can translate it to other code language ( s ) which you can too if you wish it in another codebase. You will need exception trapping an so on, just do a compile and start fixing errors. When you get a clean compile, start over from the beginnning and look for blunders. ( that's what a blunder is called in engineering - a blunder )