I have written some data in hdfs, but i want that to be without the first two bytes that the writeUTF() method writes. I want to copy this first two byte free hdfs file to local file and do some analysis on it.
if (fs.exists(filenamePath)) {
// remove the file first
//fs.delete(filenamePath);
out = fs.append(filenamePath);
}
// create if file doesnt exists
else{
out = fs.create(filenamePath);
}
out.writeUTF(getFeaturesString(searchCriteriaList,fileNameData));
out.close();
The data written is as follows
0aEX Series ex4200-24f....
I want only
EX Series ex4200-24f
I write all the data to hdfs file and then I am copying the file into local to do some analysis. Is there an alternative method to accomplish this..
how to ignore first two bytes hdfs writeUTF() and writeChars()?
You've just answered your own question. Use writeChars().
writeUTF() is only useful when somebody is going to be calling readUTF() to read it. It uses a modified character set and a length-word that is only understood by readUTF().
There's no particular reason to use DataOutputStream here either. If the data is all text, use a BufferedWriter.
Related
I have this code for transferring text from one file to another, but it's not working. It only transfer the second and the third letter from the first file to the second and not the whole text. Can anyone help me?
FileInputStream br=new FileInputStream(ss);
FileOutputStream rb=new FileOutputStream(sd);
while(br.read()!=-1){
rb.write(br.read());
}
if(br!=null)
br.close();
if(rb!=null)
rb.close();
}
Every read reads a byte, since you do two reads in the loop you ignore every second byte.
You need to do something like this:
int b;
while ((b=br.read()) != -1) {
rb.write(b);
}
Even better would be to use a larger buffer to read and write the bytes in order to have fewer read and write calls.
I'm working with a sensor that taking data and giving it to me whenever I call on it. I want to collect 1 minute of data in an arraylist and then save it to a file or something so that I can analyze it at a different time.
So I have something like this:
ArrayList<DataObject> data = new ArrayList<DataObject>();
public void onUpdate(DataObject d) { //being called by my sensor every second
data.add(d);
}
I want to save the ArrayList data to a file to my computer so that I can later feed it into a different program and use it there.
How do I go about doing this?
If you want to save these as CSV files, they'll be easily exportable and importable, even to Excel (which may be of value for doing further work or passing on results).
Check out OpenCSV and in particular this entry in the FAQ relating to writing the data out.
e.g.
CSVWriter writer = new CSVWriter(new FileWriter("yourfile.csv"), ',');
// feed in your array (or convert your data to an array)
String[] entries = "first#second#third".split("#");
writer.writeNext(entries);
writer.close();
I think you should just output the values to a file with something as a delimiter between the values, then read the files into an array in a new program. To get the array into a file, loop through the array while appending each number to a file when looped through until you reach the end.
If the other program is also based on Java, you could leverage the Java Serializable inferface. Here is a tutorial, Java - Serialization.
it would be best to use ObjectOutputStream for the purpose, since the output of the sensor is a integer or double. using writeObject method method your task can be done.
see the link for a detailed reading:
http://docs.oracle.com/javase/7/docs/api/java/io/ObjectOutputStream.html
I have a function in which I am only given a BufferedInputStream and no other information about the file to be read. I unfortunately cannot alter the method definition as it is called by code I don't have access to. I've been using the code below to read the file and place its contents in a String:
public String[] doImport(BufferedInputStream stream) throws IOException, PersistenceException {
int bytesAvail = stream.available();
byte[] bytesRead = new byte[bytesAvail];
stream.read(bytesRead);
stream.close();
String fileContents = new String(bytesRead);
//more code here working with fileContents
}
My problem is that for large files (>2Gb), this code causes the program to either run extremely slowly or truncate the data, depending on the computer the program is executed on. Does anyone have a recommendation regarding how to deal with large files in this situation?
You're assuming that available() returns the size of the file; it does not. It returns the number of bytes available to be read, and that may be any number less than or equal to the size of the file.
Unfortunately there's no way to do what you want in just one shot without having some other source of information on the length of the file data (i.e., by calling java.io.File.length()). Instead, you have to possibly accumulate from multiple reads. One way is by using ByteArrayOutputStream. Read into a fixed, finite-size array, then write the data you read into a ByteArrayOutputStream. At the end, pull the byte array out. You'll need to use the three-argument forms of read() and write() and look at the return value of read() so you know exactly how many bytes were read into the buffer on each call.
I'm not sure why you don't think you can read it line-by-line. BufferedInputStream only describes how the underlying stream is accessed, it doesn't impose any restrictions on how you ultimately read data from it. You can use it just as if it were any other InputStream.
Namely, to read it line-by-line you could do
InputStreamReader streamReader = new InputStreamReader(stream);
BufferedInputReader lineReader = new BufferedInputReader(streamReader);
String line = lineReader.readLine();
...
[Edit] This response is to the original wording of the question, which asked specifically for a way to read the input file line-by-line.
I am using java to read a TSV file that is 4gb in size and i wanted to know if there is a way for java to tell me how far it is through the task as the program is running. I'm thinking file stream might be able to tell me how many bytes it has read and i could do some simple math with that.
A plain stream or reader doesn't count the number of bytes / characters read.
I think you might be looking for ProgressMonitorInputStream.
If you don't want / need the Swing integration, then another alternative is
to write a custom subclass of FilterReader or FilterInputStream that counts the characters/bytes read and provides a getter for reading the count. Then put the custom class into your input stack at the appropriate point.
As you read from the stream, keep a tally of bytes read. For example, if you are reading byte arrays directly from the stream:
long bytesReadTotal = 0L;
int bytesRead = stream.read(bytes);
while (bytesRead != -1) {
bytesReadTotal += bytesRead;
// process these bytes ...
bytesRead = stream.read(bytes)
}
If you read this file through HTTP, there is a header named "Content-Length" can tell you the total number of bytes you should read, then you know the progress while you are reading.
If you read the file through TCP/UDP, I guess you should write both the client and the server for file transferring, then you should send the file length first to the client, then read the file.
If you just read a local file, this is not a problem.
How to store data in text file in java that has various attributes like name, author etc that will be inputted by the user on CLI.
any Idea?
Thanks
It sounds like the Java class that will suit you best is a FileWriter. However, if you are writing a file with Key=Value lines, then the Properties class might end up being the better choice.
"[S]tore data in text file" sounds like you want a readable format. You can use comma-separated value (CSV) files.
You can write your own CSV serializer (search on SO for "how to write csv java") or use a solution like the Java CSV library.
Use DataOutputStream and DataInputStream. Using this class make it easier to read integer, float, double data and others without needing to interpret if the read data should be an integer or a float data.
something lyk dis
DataOutputStream dos = new DataOutputStream(fos);
//
// Below we write some data to the cities.dat.
// DataOutputStream class have various method that allow
// us to write primitive type data and string. There are
// method called writeInt(), writeFloat(), writeUTF(),
// etc.
//
dos.writeInt(cityIdA);
dos.writeUTF(cityNameA);
dos.writeInt(cityPopulationA);
dos.writeFloat(cityTempA);
dos.writeInt(cityIdB);
dos.writeUTF(cityNameB);
dos.writeInt(cityPopulationB);
dos.writeFloat(cityTempB);
dos.flush();
dos.close();