Read bytes from file Java - java
I'm trying to parse my file which keeps all data in binary form. How to read N bytes from file with offset M? And then I need to convert it to String using new String(myByteArray, "UTF-8");. Thanks!
Here's some code:
File file = new File("my_file.txt");
byte [] myByteArray = new byte [file.lenght];
UPD 1: The answers I see are not appropriative. My file keeps strings in byte form, for example: when I put string "str" in my file it actually prints smth like [B#6e0b... in my file. Thus I need to get from this byte-code my string "str" again.
UPD 2: As it's found out the problem appears when I use toString():
PrintWriter writer = new PrintWriter(new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(System.getProperty("db.file")), true), "UTF-8")));
Iterator it = storage.entrySet().iterator();//storage is a map<String, String>
while (it.hasNext()){
Map.Entry pairs = (Map.Entry)it.next();
String K = new String(pairs.getKey().toString());
String V = new String(pairs.getValue().toString);
writer.println(K.length() + " " + K.getBytes() + " " + V.length() + " " + V.getBytes());//this is just the format I need to have in file
it.remove();
}
May be there're some different ways to perform that?
As of Java 7, reading the whole of a file really easy - just use Files.readAllBytes(path). For example:
Path path = Paths.get("my_file.txt");
byte[] data = Files.readAllBytes(path);
If you need to do this more manually, you should use a FileInputStream - your code so far allocates an array, but doesn't read anything from the file.
To read just a portion of a file, you should look at using RandomAccessFile, which allows you to seek to wherever you want. Be aware that the read(byte[]) method does not guarantee to read all the requested data in one go, however. You should loop until either you've read everything you need, or use readFully instead. For example:
public static byte[] readPortion(File file, int offset, int length)
throws IOException {
byte[] data = new byte[length];
try (RandomAccessFile raf = new RandomAccessFile(file)) {
raf.seek(offset);
raf.readFully(data);
}
return data;
}
EDIT: Your update talks about seeing text such as [B#6e0b... That suggests you're calling toString() on a byte[] at some point. Don't do that. Instead, you should use new String(data, StandardCharsets.UTF_8) or something similar - picking the appropriate encoding, of course.
Related
Long string pasted into Eclipse is invisible
When pasting the following String into Eclipse, the text shows up as blank spaces: 1140002,1210002,1960001,2120002,2140001,3890001,6770002,6800002,7790002,9130002,10230002,12110002,12120002,13660002,14130001,14480002,15540001,15990002,16240002,16720002,16840002,16930002,17180002,18750001,19330001,35170001,39220001,41950001,42120001,43080001,54100001,56410001,65970001,82040001,84530001,84710001,84730001,85010001,85250001,85340001,85630001,85730001,85790001,85930001,85970001,86040001,86370001,86490001,86670001,86680001,86830001,86910001,86940001,87120001,90860001,93220001,97730001,98330001,107400001,110800001,118420001,118830001,118970001,121690001,121710001,122980001,125030001,125040001,125670001,125700001,125860001,125880001,128720001,129000001,130720001,131330001,135460001,140770001,141420001,141720001,142690001,145610001,970001,1400001,1530001,1760001,2020001,2270002,2890001,4150001,5780002,8430002,9150001,9970002,11780002,13860002,14160002,14240002,14490001,14500002,14530002,14850002,15290002,15560002,15690002,16300002,16620001,16660001,17200002,19580001,19790001,39760001,42010001,55540001,56640001,56910001,56920001,57230001,57390001,57420001,57600001,57860001,65690001,74550001,77280001,81340001,81880001,82100001,82920001,83200001,84280001,84350001,84790001,84970001,85260001,85380001,85700001,85980001,86050001,86590001,86600001,86660001,87150001,87360001,87550001,93110001,97540001,102430001,111200001,118880001,119020001,119970001,121700001,123780001,124940001,125000001,125450001,125760001,125790001,128690001,129180001,129980001,129990001,130000001,131340001,133430001,135340001,135470001,135480001,137040001,137140001,137490001,138130001,140050001,140800001,141970001,142460001,142860001,146730001,840002,2630001,3420001,5270001,7830002,9640002,9800002,10040002,10190002,12030002,13090001,14090002,15100002,15380002,15390002,15590002,15790002,15920002,16630001,16640002,17170002,17740001,19460001,55570001,57020001,57130001,57620001,57690001,65450001,66300001,68470001,68680001,69250001,70510001,71930001,72060001,75220001,75890001,77810001,81540001,84870001,84880001,85000001,85130001,85270001,85320001,85410001,85510001,85580001,85750001,85770001,86090001,86110001,86290001,86300001,86460001,86510001,86750001,86770001,87060001,87340001,92850001,94320001,96850001,102900001,103390001,108940001,110710001,112550001,113020001,114550001,118380001,124410001,124840001,124850001,125050001,125780001,125870001,125900001,126690001,128750001,129050001,129270001,130170001,130700001,130730001,132470001,132830001,133480001,133570001,134780001,135930001,135990001,136220001,140060001,141150001,141590001,142480001,143090001,148280001,1200002,2300001,3790001,6870002,7840002,8380002,8420002,8890001,9930002,10030001,10870001,12340001,12680002,12920002,13410002,13520002,14070002,14200002,14280002,14360002,14970001,15310002,15700002,15880002,16310002,16380002,16450002,16750002,16780002,16850002,17610001,18560001,19370001,37820001,40370001,54050001,57000001,58020001,68850001,69740001,75290001,78650001,80290001,83690001,84490001,84580001,84600001,84630001,85120001,85180001,85420001,85670001,85780001,85830001,85870001,86080001,86320001,86390001,86400001,86820001,86920001,87040001,87890001,87910001,94400001,94550001,97030001,97170001,99630001,101570001,109360001,110650001,110860001,110880001,114480001,118930001,119010001,124500001,124520001,125010001,125320001,125340001,125530001,125690001,130760001,131360001,131370001,132910001,133100001,133410001,133530001,133660001,136080001,137070001,141410001,141690001,142470001,142840001,144240001,146680001,147720001,930001,1780001,2520001,5320001,6050002,7970002,8360002,10770002,11360002,13000002,13690002,14270002,14290002,15470002,15520001,15520002,15550002,15670001,15910002,16190002,16610001,16680001,16790002,16860002,16890001,19150001,31990001,35990001,36360001,40790001,41290001,41930001,56460001,56930001,57180001,57190001,65400001,68670001,75340001,76010001,77110001,77460001,83750001,84640001,84840001,86240001,86410001,86430001,86470001,86730001,86790001,86810001,86970001,89300001,93130001,93700001,94070001,97230001,97270001,98040001,100880001,109440001,109480001,114460001,116050001,116250001,117680001,118410001,122960001,122970001,124090001,125080001,125260001,125330001,125550001,125720001,129660001,131320001,133510001,133580001,136000001,138200001,140790001,141240001,141640001,142020001,142440001,144720001,146000001,990001,2270001,2730001,4090002,6340002,8360001,8390002,10290002,11750002,11970001,12640002,13990002,14040002,14250002,15370002,15500002,15770002,16020002,16370002,16900002,17940001,20610001,38190001,44740001,53780001,56390001,57240001,58000001,68460001,69560001,76640001,79280001,81330001,82960001,84570001,84620001,84720001,84740001,85140001,85240001,85400001,85430001,85470001,85480001,85720001,85920001,85940001,86230001,86880001,87080001,87090001,87330001,93090001,93150001,93160001,93250001,94520001,95080001,97210001,110260001,118540001,121180001,121240001,121490001,123810001,124550001,124890001,124920001,125220001,125380001,125500001,125890001,128990001,129500001,129730001,129970001,130710001,130740001,130750001,133370001,133740001,135160001,135890001,137130001,137620001,138180001,138190001,141390001,141710001,143060001,146670001,147640001,750001,990002,1000002,5090002,6460002,6520002,8030002,8320001,9390001,9520001,10840002,11460002,13060002,14140002,14300002,14350002,14370002,14790002,14840002,14940001,15050002,15630002,15860002,16100002,16630002,16650001,16670001,16670002,16700002,17270002,18530001,18710001,32430001,32730001,33310001,43140001,43150001,54090001,55580001,56230001,57060001,57100001,57340001,57440001,57560001,57750001,58120001,65940001,65990001,68480001,69410001,76560001,82860001,83890001,84610001,84910001,85190001,85200001,85330001,85360001,85490001,85540001,85820001,86060001,86520001,86720001,87350001,87580001,93190001,93480001,93870001,97640001,102490001,113010001,114470001,117430001,118960001,118980001,119000001,123140001,124960001,125060001,125070001,125250001,125310001,125430001,125510001,125680001,125730001,125770001,125910001,128930001,131390001,132020001,133490001,133500001,133550001,133600001,135450001,136020001,138210001,138600001,140740001,141570001,141660001,141670001,142140001,142450001,142620001,142630001,143120001,147730001,148490001,1100002,1900001,3200002,6760002,10050002,13700002,15030002,15780001,16260002,16650002,16950001,20480001,37830001,38640001,42030001,45300001,54040001,57090001,57580001,75450001,76920001,84130001,84220001,84800001,84810001,84850001,84940001,84980001,85070001,85080001,85220001,85310001,85600001,85840001,85890001,85910001,86650001,86860001,86890001,87050001,93070001,93440001,93750001,94250001,94980001,96360001,99620001,101400001,109000001,109340001,112210001,116140001,118990001,122790001,123200001,124390001,124930001,125410001,125540001,125710001,125740001,128610001,128780001,129040001,129320001,131400001,132270001,132940001,133440001,133670001,135440001,135880001,135900001,135980001,137050001,137060001,140170001,140780001,140970001,141380001,142130001,143020001,143210001,145920001,148300001 Interestingly, this only happens inside the main method of my program, but not when pasted outside the main class. Why is it that this long string becomes totally invisible when pasted inside the main method, and how can I go about pasting this string into my program?
I suppose in cases like this it would be wiser to just read a .txt file containing the desired data and then write it to a String. I was just curious as to why Eclipse doesn't accept long strings into Java programs. Anyway: String myCurrentDir = System.getProperty("user.dir"); String textDir = myCurrentDir + "\\" + MyClass.class.getName().toString() + ".txt"; // System.out.println(textDir); File f = new File(textDir); FileInputStream fin = new FileInputStream(f); byte[] buffer = new byte[(int) f.length()]; new DataInputStream(fin).readFully(buffer); fin.close(); String commaString = new String(buffer, "UTF-8");
byte[] InputStream converted to String
This is my case: I'm using a library for reading files from a respository (I can't modify that library), the library has a method getContent that returns a String (it uses BasicResponseHandler to convert the response to String), but the repository also contains binary files too, and I need bytes[] to save that as a file. I tried using content.getBytes("UTF-8") and it works with text files, but with other files like images, I get a corrupted file. BasicResponseHandler uses this to convert the input to String (charset is UTF-8): Reader reader = new InputStreamReader(instream, charset); CharArrayBuffer buffer = new CharArrayBuffer(i); try { char[] tmp = new char[1024]; int l; while((l = reader.read(tmp)) != -1) { buffer.append(tmp, 0, l); } } finally { reader.close(); } return buffer.toString(); Does anyone know what I can do?
When you read an image, that isn't a String, and shouldn't be converted. Simply write the byte[]'s back out to file, and you'll have an image stored in said file.
If you aren't able to edit the library code being used, I would suggest looking for a new library to use. Perhaps one that doesn't assume anything about the file content type.
Java csv file unable to write string like 012365479
Hi write a java code to write the output into a csv file. This is the sample code: File downloadPlace = new File(realContextPathFile, "general"); File gtwayDestRateFile = new File(downloadPlace, (new StringBuilder("ConnectionReport")).append(System.currentTimeMillis()).append(".csv").toString()); PrintWriter pw = new PrintWriter(new FileWriter(gtwayDestRateFile)); pw.print("Operator name,"); pw.print("Telephone Number,"); pw.print("Op1"); pw.print("012365479"); pw.print("Op2"); pw.print("09746"); pw.close(); p_response.setContentType("application/octet-stream"); p_response.setHeader("Content-Disposition", (new StringBuilder("attachment; filename=\"")).append(gtwayDestRateFile.getName()).append("\"").toString()); FileInputStream fis = new FileInputStream(gtwayDestRateFile); byte buf[] = new byte[4096]; ServletOutputStream out = p_response.getOutputStream(); do { int n = fis.read(buf); if(n == -1) break; out.write(buf, 0, n); } while(true); fis.close(); out.flush(); In both case the output is like this: 12365479 instead of 012365479 And 9746 instead of 09746 Can anyone tell me how can i solve this problem?
Are you sure that the file is written wrongly, and you're not just opening it in Excel which is interpreting these as numbers and thus losing the leading zeroes? Try opening it in a text editor.
If you write to System.out instead you get Operator name,Telephone Number,Op1012365479Op209746 As you can see the 0 is where you would expect. Perhaps the problem is you don't have , between fields. If you open such a file using excel it will remove leading 0 as it assume its a number. To avoid this you need to use double quotes around the field so it is treated as text.
Read the file in a text editor, my guess is that it has the zero and what's reading it is thinking it's a number. Try putting quotes round it. pw.print("\"012365479\"");
How do i get a filename of a file inside a gzip in java?
int BUFFER_SIZE = 4096; byte[] buffer = new byte[BUFFER_SIZE]; InputStream input = new GZIPInputStream(new FileInputStream("a_gunzipped_file.gz")); OutputStream output = new FileOutputStream("current_output_name"); int n = input.read(buffer, 0, BUFFER_SIZE); while (n >= 0) { output.write(buffer, 0, n); n = input.read(buffer, 0, BUFFER_SIZE); } }catch(IOException e){ System.out.println("error: \n\t" + e.getMessage()); } Using the above code I can succesfully extract a gzip's contents although the extracted file's filenames are, as expected, will always be current_output_name (I know its because I declared it to be that way in the code). My problem is I dont know how to get the file's filename when it is still inside the archive. Though, java.util.zip provides a ZipEntry, I couldn't use it on gzip files. Any alternatives?
as i kinda agree with "Michael Borgwardt" on his reply, but it is not entirely true, gzip file specifications contains an optional file name stored in the header of the gz file, sadly there are no way (as far as i know ) of getting that name in current java (1.6). as seen in the implementation of the GZIPInputStream in the method getHeader in the openjdk they skip reading the file name // Skip optional file name if ((flg & FNAME) == FNAME) { while (readUByte(in) != 0) ; } i have modified the class GZIPInputStream to get the optional filename out of the gzip archive(im not sure if i am allowed to do that) (download the original version from here), you only need to add a member String filename; to the class, and modify the above code to be : // Skip optional file name if ((flg & FNAME) == FNAME) { filename= ""; int _byte = 0; while ((_byte= readUByte(in)) != 0){ filename += (char)_byte; } } and it worked for me.
Apache Commons Compress offers two options for obtaining the filename: With metadata (Java 7+ sample code) try ( // GzipCompressorInputStream gcis = // new GzipCompressorInputStream( // new FileInputStream("a_gunzipped_file.gz") // ) // ) { String filename = gcis.getMetaData().getFilename(); } With "the convention" String filename = GzipUtils.getUnCompressedFilename("a_gunzipped_file.gz"); References Apache Commons Compress GzipCompressorInputStream See also: GzipUtils#getUnCompressedFilename
Actually, the GZIP file format, using the multiple members, allows the original filename to be specified. Including a member with the FLAG of FLAG.FNAME the name can be specified. I do not see a way to do this in the java libraries though. http://www.gzip.org/zlib/rfc-gzip.html#specification
following the answers above, here is an example that creates a file "myTest.csv.gz" that contains a file "myTest.csv", notice that you can't change the internal file name, and you can't add more files into the gz file. #Test public void gzipFileName() throws Exception { File workingFile = new File( "target", "myTest.csv.gz" ); GZIPOutputStream gzipOutputStream = new GZIPOutputStream( new FileOutputStream( workingFile ) ); PrintWriter writer = new PrintWriter( gzipOutputStream ); writer.println("hello,line,1"); writer.println("hello,line,2"); writer.close(); }
Gzip is purely compression. There is no archive, it's just the file's data, compressed. The convention is for gzip to append .gz to the filename, and for gunzip to remove that extension. So, logfile.txt becomes logfile.txt.gz when compressed, and again logfile.txt when it's decompressed. If you rename the file, the name information is lost.
How can I store large amount of data from a database to XML (memory problem)?
First, I had a problem with getting the data from the Database, it took too much memory and failed. I've set -Xmx1500M and I'm using scrolling ResultSet so that was taken care of. Now I need to make an XML from the data, but I can't put it in one file. At the moment, I'm doing it like this: while(rs.next()){ i++; xmlStringBuilder.append("\n\t<row>"); xmlStringBuilder.append("\n\t\t<ID>" + Util.transformToHTML(rs.getInt("id")) + "</ID>"); xmlStringBuilder.append("\n\t\t<JED_ID>" + Util.transformToHTML(rs.getInt("jed_id")) + "</JED_ID>"); xmlStringBuilder.append("\n\t\t<IME_PJ>" + Util.transformToHTML(rs.getString("ime_pj")) + "</IME_PJ>"); //etc. xmlStringBuilder.append("\n\t</row>"); if (i%100000 == 0){ //stores the data to a file with the name i.xml storeKBR(xmlStringBuilder.toString(),i); xmlStringBuilder= null; xmlStringBuilder= new StringBuilder(); } and it works; I get 12 100 MB files. Now, what I'd like to do is to do is have all that data in one file (which I then compress) but if just remove the if part, I go out of memory. I thought about trying to write to a file, closing it, then opening, but that wouldn't get me much since I'd have to load the file to memory when I open it.
Why not write all data to one file and open the file with the "append" option? There is no need to read in all the data in the file if you are just going to write to it. However, this might be a better solution: PrintWriter writer = new PrintWriter(new BufferedOutputStream(new FileOutputStream("data.xml"))); while(rs.next()){ i++; writer.print("\n\t<row>"); writer.print("\n\t\t<ID>" + Util.transformToHTML(rs.getInt("id")) + "</ID>"); writer.print("\n\t\t<JED_ID>" + Util.transformToHTML(rs.getInt("jed_id")) + "</JED_ID>"); writer.print("\n\t\t<IME_PJ>" + Util.transformToHTML(rs.getString("ime_pj")) + "</IME_PJ>"); //... writer.print("\n\t</row>"); } writer.close(); The BufferedOutputStream will buffer the data before printing it, and you can specify the buffer size in the constructor if the default value does not suit your needs. See the java API for details: http://java.sun.com/javase/6/docs/api/.
You are assembling the complete file in memory: what you should be doing is writing the data directly to the file. Additionally, you might consider using a proper XML API rather than assembling XML as a text file. A short tutorial is available here.
I have never encountered this usecase but I am pretty sure vtd-xml supports xml's of size more than 1 GB. It is worth checking out # http://vtd-xml.sourceforge.net Or you can also follow all the below article series # http://www.ibm.com/developerworks/ "Output large XML documents"
Ok, so the code is rewritten and I'll include the whole operation: //this is the calling/writing function; I have 8 types of "proizvod" which makes //8 XML files. After an XML file is created, it needs to be zipped by a custom zip class generateXML(tmpParam,queryRBR,proizvod.getOznaka()); writeToZip(proizvod.getOznaka()); //inside writeToZip ZipEntry ze = new ZipEntry(oznaka + ".xml"); FileOutputStream fos = new FileOutputStream(new File(zipFolder + oznaka + ".zip")); ZipOutputStream zos = new ZipOutputStream(fos); zos.putNextEntry(ze); FileInputStream fis = new FileInputStream(new File(zipFolder + oznaka + ".xml")); final byte[] buffer = new byte[1024]; int n; while ((n = fis.read(buffer)) != -1) zos.write(buffer, 0, n); zos.closeEntry(); zos.flush(); zos.close(); fis.close(); // inside generateXML PrintWriter writer = new PrintWriter(new BufferedOutputStream(new FileOutputStream(zipFolder +oznaka + ".xml"))); writer.print("\n<?xml version=\"1.0\" encoding=\"UTF-8\" ?>"); writer.print("\n<PROSTORNE_JEDINICE>"); stmt = cm.getConnection().createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY); String q = ""; rs = stmt.executeQuery(q); if(rs != null){ System.out.println("Početak u : " +Util.nowTime()); while(rs.next()){ writer.print("\n\t<row>"); writer.print("\n\t\t<ID>" + Util.transformToHTML(rs.getInt("id")) + "</ID>"); writer.print("\n\t\t<JED_ID>" + Util.transformToHTML(rs.getInt("jed_id")) + "</JED_ID>"); //etc writer.print("\n\t</row>"); } System.out.println("Kraj u : " +Util.nowTime()); } writer.print("\n</PROSTORNE_JEDINICE>"); But generateXML part still takes a lot of memory (if I'm guessing correctly, it takes bit by bit as much as it can) and I don't see how I could optimize it (use an alternative way to feed the writer.print function)?