I have an example which is workingfine. With this example (provided below), I can detect the encoding of file using the universaldetector framework from mozilla.
But I want that this example to detect the encoding of input and not of the file for Example using class Scanner? How can I modify the code below to detect the encoding of input instead of file?
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import org.mozilla.universalchardet.UniversalDetector;
public class TestDetector {
public static void main(String[] args) throws java.io.IOException {
byte[] buf = new byte[4096];
java.io.FileInputStream fis = new java.io.FileInputStream("C:\\Users\\khalat\\Desktop\\Java\\toti.txt");
// (1)
UniversalDetector detector = new UniversalDetector(null);
// (2)
int nread;
while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
detector.handleData(buf, 0, nread);
}
// (3)
detector.dataEnd();
// (4)
String encoding = detector.getDetectedCharset();
if (encoding != null) {
System.out.println("Detected encoding = " + encoding);
} else {
System.out.println("No encoding detected.");
}
// (5)
detector.reset();
}
}
i found a elegant example wich can test at least, wether the charatcht is ISO-8859-1 see code below.
public class TestIso88591 {
public static void main(String[] args){
if(TestIso88591.testISO("ü")){
System.out.println("True");
}
else{
System.out.println("False");
}
}
public static boolean testISO(String text){
return Charset.forName(CharEncoding.ISO_8859_1).newEncoder().canEncode(text);
}
}
now i hav question to expert Java .there is a posibillity to test charachter wether it is ISO-8859-5 or ISO-8859-7? yes yes I know there is utf-8 but my exact question its how can i test the iso-8859-5 charachter. because the input data should be stored in SAP and SAP can handel only with ISO-8859-1 CHarachter. I need that as soon as.
OK I researched a bit more. And the result is. It is useless to read bytes from stdin to guess the encoding, because the java API let you directly read the input as a string which is already encoded ;) The only usecase for this dector is when you get a stream of unknown bytes from a file or socket etc. to guess how to decode it in a java string.
Next pseudo code, it's only theoretical approach to it. But as we figured out it makes no sense ;)
Its very simple.
byte[] buf = new byte[4096];
java.io.FileInputStream fis = new java.io.FileInputStream("C:\\Users\\khalat\\Desktop\\Java\\toti.txt");
UniversalDetector detector = new UniversalDetector(null);
int nread;
while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
detector.handleData(buf, 0, nread);
}
What you are doing here is reading from the file into an byte array, which is then passed to the detector.
Replace your FileInputStream with an other reader.
For example to read everything from Standard In:
byte[] buf = new byte[4096];
InputStreamReader isr = new InputStreamReader(System.in);
UniversalDetector detector = new UniversalDetector(null);
int nread = 0;
while ((nread = isr.read(buf, nread, buf.length)) > 0 && !detector.isDone()) {
detector.handleData(buf, 0, nread);
}
ATTENTION!!
This code is not tested by me. Its only based in Java API Docs.
I also would place a BufferedReader between the input stream and the read, to puffer. Also it can't work because of the size of the buffer with 4096 bytes. As I see my Example it would work, when you directly enter minimum 4096 bytes in Stdandard IN in one chunk, otherwise the while loop will never start.
About Reader API, The Base class java.io.Reader (http://docs.oracle.com/javase/7/docs/api/java/io/Reader.html#read(char[],%20int,%20int)) Defines the method read as abstract, and any Reader based impl. has to impl this method. SO IT IS THERE!!!
About you can't figure out the encoding of a chunk of unknown bytes. Yes thats right. But you can make a guess, like the detector from mozilla tries. Because you have some clues: 1. We expect that the bytes are a text 2. we know any byte in any specified encoding 3. we can trie to decode several bytes in a guessed encoding and compare the resulting string
About we are experts:
Yes most of use are ones ;) But we don't like to make the homework for someone else. We like to fix bugs or give advices. So provide a full example which provides an error we can fix. Or as it happend here: we give you an advice with some pseudo code. (I don't have the time to setup a project and write you an working example)
Nice comment thread ;)
I need to be able to read the bytes from a file in android.
Everywhere I look, it seems that FileInputStream should be used to read bytes from a file but that is not what I want to do.
I want to be able to read a text file that contains (edit) a textual representation of byte-wide numeric values (/edit) that I want to save to an array.
An example of the text file I want to have converted to a byte array follows:
0x04 0xF2 0x33 0x21 0xAA
The final file will be much longer. Using FileInputStream takes the values of each character where I want to save an array of length five to have the values listed above.
I want the array to be processed like:
ExampleArray[0] = (byte) 0x04;
ExampleArray[1] = (byte) 0xF2;
ExampleArray[2] = (byte) 0x33;
ExampleArray[3] = (byte) 0x21;
ExampleArray[4] = (byte) 0xAA;
Using FileInputStream on a text file returns the ASCII values of the characters and not the values I need written to the array.
The simplest solution is to use FileInputStream.read(byte[] a) method which will transfer the bytes from file into byte array.
Edit: It seems I've misread the requirements. So the file contains the text representation of bytes.
Scanner scanner = new Scanner(new FileInputStream(FILENAME));
String input;
while (scanner.hasNext()) {
input = scanner.next();
long number = Long.decode(input);
// do something with the value
}
Old answer (obviously wrong for this case, but I'll leave it for posterity):
Use a FileInputStream's read(byte[]) method.
FileInputStream in = new FileInoutStream(filename);
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = in.read(buffer, 0, buffer.length);
You just don't store bytes as text. Never!
Because 0x00 can be written as one byte in a file, or as a string, in this case (hex) taking up 4 times more space.
If you're required to do this, discuss how awful this decision would be!
I will edit my answer if you can provide a sensible reason though.
You would only save stuff as actual text, if:
It is easier (not the case)
It adds value (if an increase in filesize by over 4 (spaces count) adds value, then yes)
If users should be able to edit the file (then you would omit the "0x"...)
You can write bytes like this:
public static void writeBytes(byte[] in, File file, boolean append) throws IOException {
FileOutputStream fos = null;
try {
fos = new FileOutputStream(file, append);
fos.write(in);
} finally {
if (fos != null)
fos.close();
}
}
and read like this:
public static byte[] readBytes(File file) throws IOException {
return readBytes(file, (int) file.length());
}
public static byte[] readBytes(File file, int length) throws IOException {
byte[] content = new byte[length];
FileInputStream fis = null;
try {
fis = new FileInputStream(file);
while (length > 0)
length -= fis.read(content);
} finally {
if (fis != null)
fis.close();
}
return content;
}
and therefore have:
public static void writeString(String in, File file, String charset, boolean append)
throws IOException {
writeBytes(in.getBytes(charset), file, append);
}
public static String readString(File file, String charset) throws IOException {
return new String(readBytes(file), charset);
}
to write and read strings.
Note that I don't use the try-with-resource construct because Android's current Java source level is too low for that. :(
We are really stuck on this topic, this is the only code we have which converts a file into hex but we need to open a file and then for the java code to read the hex and extract certain bytes (e.g. the first 4 bytes for the file extension:
import java.io.*;
public class FileInHexadecimal
{
public static void main(String[] args) throws Exception
{
FileInputStream fis = new FileInputStream("H://Sample_Word.docx");
int i = 0;
while ((i = fis.read()) != -1) {
if (i != -1) {
System.out.printf("%02X\n ", i);
}
}
fis.close();
}
}
Do not confuse internal and external representation - what you do when converting to hex is that you only create a different representation of the same bytes.
There is no need to convert to hex if you just want to read some bytes from the file - just read them. For example, to read the first four bytes, you can use something like
byte[] buffer = new byte[4];
FileInputStream fis = new FileInputStream("H://Sample_Word.docx");
int read = fis.read(buffer);
if (read != buffer.length) {
System.out.println("Short file!");
}
If you need to read data from an arbitrary position within the file, you might want to check RandomAccessFile instead of using a stream. RandomAccessFile allows to set the position where to start reading.
The documentation says that one should not use available() method to determine the size of an InputStream. How can I read the whole content of an InputStream into a byte array?
InputStream in; //assuming already present
byte[] data = new byte[in.available()];
in.read(data);//now data is filled with the whole content of the InputStream
I could read multiple times into a buffer of a fixed size, but then, I will have to combine the data I read into a single byte array, which is a problem for me.
The simplest approach IMO is to use Guava and its ByteStreams class:
byte[] bytes = ByteStreams.toByteArray(in);
Or for a file:
byte[] bytes = Files.toByteArray(file);
Alternatively (if you didn't want to use Guava), you could create a ByteArrayOutputStream, and repeatedly read into a byte array and write into the ByteArrayOutputStream (letting that handle resizing), then call ByteArrayOutputStream.toByteArray().
Note that this approach works whether you can tell the length of your input or not - assuming you have enough memory, of course.
Please keep in mind that the answers here assume that the length of the file is less than or equal to Integer.MAX_VALUE(2147483647).
If you are reading in from a file, you can do something like this:
File file = new File("myFile");
byte[] fileData = new byte[(int) file.length()];
DataInputStream dis = new DataInputStream(new FileInputStream(file));
dis.readFully(fileData);
dis.close();
UPDATE (May 31, 2014):
Java 7 adds some new features in the java.nio.file package that can be used to make this example a few lines shorter. See the readAllBytes() method in the java.nio.file.Files class. Here is a short example:
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
// ...
Path p = FileSystems.getDefault().getPath("", "myFile");
byte [] fileData = Files.readAllBytes(p);
Android has support for this starting in Api level 26 (8.0.0, Oreo).
You can use Apache commons-io for this task:
Refer to this method:
public static byte[] readFileToByteArray(File file) throws IOException
Update:
Java 7 way:
byte[] bytes = Files.readAllBytes(Paths.get(filename));
and if it is a text file and you want to convert it to String (change encoding as needed):
StandardCharsets.UTF_8.decode(ByteBuffer.wrap(bytes)).toString()
You can read it by chunks (byte buffer[] = new byte[2048]) and write the chunks to a ByteArrayOutputStream. From the ByteArrayOutputStream you can retrieve the contents as a byte[], without needing to determine its size beforehand.
I believe buffer length needs to be specified, as memory is finite and you may run out of it
Example:
InputStream in = new FileInputStream(strFileName);
long length = fileFileName.length();
if (length > Integer.MAX_VALUE) {
throw new IOException("File is too large!");
}
byte[] bytes = new byte[(int) length];
int offset = 0;
int numRead = 0;
while (offset < bytes.length && (numRead = in.read(bytes, offset, bytes.length - offset)) >= 0) {
offset += numRead;
}
if (offset < bytes.length) {
throw new IOException("Could not completely read file " + fileFileName.getName());
}
in.close();
Max value for array index is Integer.MAX_INT - it's around 2Gb (2^31 / 2 147 483 647).
Your input stream can be bigger than 2Gb, so you have to process data in chunks, sorry.
InputStream is;
final byte[] buffer = new byte[512 * 1024 * 1024]; // 512Mb
while(true) {
final int read = is.read(buffer);
if ( read < 0 ) {
break;
}
// do processing
}
Looking to read in some bytes over a socket using an inputStream. The bytes sent by the server may be of variable quantity, and the client doesn't know in advance the length of the byte array. How may this be accomplished?
byte b[];
sock.getInputStream().read(b);
This causes a 'might not be initialized error' from the Net BzEAnSZ. Help.
You need to expand the buffer as needed, by reading in chunks of bytes, 1024 at a time as in this example code I wrote some time ago
byte[] resultBuff = new byte[0];
byte[] buff = new byte[1024];
int k = -1;
while((k = sock.getInputStream().read(buff, 0, buff.length)) > -1) {
byte[] tbuff = new byte[resultBuff.length + k]; // temp buffer size = bytes already read + bytes last read
System.arraycopy(resultBuff, 0, tbuff, 0, resultBuff.length); // copy previous bytes
System.arraycopy(buff, 0, tbuff, resultBuff.length, k); // copy current lot
resultBuff = tbuff; // call the temp buffer as your result buff
}
System.out.println(resultBuff.length + " bytes read.");
return resultBuff;
Assuming the sender closes the stream at the end of the data:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buf = new byte[4096];
while(true) {
int n = is.read(buf);
if( n < 0 ) break;
baos.write(buf,0,n);
}
byte data[] = baos.toByteArray();
Read an int, which is the size of the next segment of data being received. Create a buffer with that size, or use a roomy pre-existing buffer. Read into the buffer, making sure it is limited to the aforeread size. Rinse and repeat :)
If you really don't know the size in advance as you said, read into an expanding ByteArrayOutputStream as the other answers have mentioned. However, the size method really is the most reliable.
Without re-inventing the wheel, using Apache Commons:
IOUtils.toByteArray(inputStream);
For example, complete code with error handling:
public static byte[] readInputStreamToByteArray(InputStream inputStream) {
if (inputStream == null) {
// normally, the caller should check for null after getting the InputStream object from a resource
throw new FileProcessingException("Cannot read from InputStream that is NULL. The resource requested by the caller may not exist or was not looked up correctly.");
}
try {
return IOUtils.toByteArray(inputStream);
} catch (IOException e) {
throw new FileProcessingException("Error reading input stream.", e);
} finally {
closeStream(inputStream);
}
}
private static void closeStream(Closeable closeable) {
try {
if (closeable != null) {
closeable.close();
}
} catch (Exception e) {
throw new FileProcessingException("IO Error closing a stream.", e);
}
}
Where FileProcessingException is your app-specific meaningful RT exception that will travel uninterrupted to your proper handler w/o polluting the code in between.
The simple answer is:
byte b[] = new byte[BIG_ENOUGH];
int nosRead = sock.getInputStream().read(b);
where BIG_ENOUGH is big enough.
But in general there is a big problem with this. A single read call is not guaranteed to return all that the other end has written.
If the nosRead value is BIG_ENOUGH, your application has no way of knowing for sure if there are more bytes to come; the other end may have sent exactly BIG_ENOUGH bytes ... or more than BIG_ENOUGH bytes. In the former case, you application will block (for ever) if you try to read. In the latter case, your application has to do (at least) another read to get the rest of the data.
If the nosRead value is less than BIG_ENOUGH, your application still doesn't know. It might have received everything there is, part of the data may have been delayed (due to network packet fragmentation, network packet loss, network partition, etc), or the other end might have blocked or crashed part way through sending the data.
The best answer is that EITHER your application needs to know beforehand how many bytes to expect, OR the application protocol needs to somehow tell the application how many bytes to expect or when all bytes have been sent.
Possible approaches are:
the application protocol uses fixed message sizes (not applicable to your example)
the application protocol message sizes are specified in message headers
the application protocol uses end-of-message markers
the application protocol is not message based, and the other end closes the connection to say that is the end.
Without one of these strategies, your application is left to guess, and is liable to get it wrong occasionally.
Then you use multiple read calls and (maybe) multiple buffers.
Stream all Input data into Output stream. Here is working example:
InputStream inputStream = null;
byte[] tempStorage = new byte[1024];//try to read 1Kb at time
int bLength;
try{
ByteArrayOutputStream outputByteArrayStream = new ByteArrayOutputStream();
if (fileName.startsWith("http"))
inputStream = new URL(fileName).openStream();
else
inputStream = new FileInputStream(fileName);
while ((bLength = inputStream.read(tempStorage)) != -1) {
outputByteArrayStream.write(tempStorage, 0, bLength);
}
outputByteArrayStream.flush();
//Here is the byte array at the end
byte[] finalByteArray = outputByteArrayStream.toByteArray();
outputByteArrayStream.close();
inputStream.close();
}catch(Exception e){
e.printStackTrace();
if (inputStream != null) inputStream.close();
}
Either:
Have the sender close the socket after transferring the bytes. Then at the receiver just keep reading until EOS.
Have the sender prefix a length word as per Chris's suggestion, then read that many bytes.
Use a self-describing protocol such as XML, Serialization, ...
Use BufferedInputStream, and use the available() method which returns the size of bytes available for reading, and then construct a byte[] with that size. Problem solved. :)
BufferedInputStream buf = new BufferedInputStream(is);
int size = buf.available();
Here is a simpler example using ByteArrayOutputStream...
socketInputStream = socket.getInputStream();
int expectedDataLength = 128; //todo - set accordingly/experiment. Does not have to be precise value.
ByteArrayOutputStream baos = new ByteArrayOutputStream(expectedDataLength);
byte[] chunk = new byte[expectedDataLength];
int numBytesJustRead;
while((numBytesJustRead = socketInputStream.read(chunk)) != -1) {
baos.write(chunk, 0, numBytesJustRead);
}
return baos.toString("UTF-8");
However, if the server does not return a -1, you will need to detect the end of the data some other way - e.g., maybe the returned content always ends with a certain marker (e.g., ""), or you could possibly solve using socket.setSoTimeout(). (Mentioning this as it is seems to be a common problem.)
This is both a late answer and self-advertising, but anyone checking out this question may want to take a look here:
https://github.com/GregoryConrad/SmartSocket
This question is 7 years old, but i had a similiar problem, while making a NIO and OIO compatible system (Client and Server might be whatever they want, OIO or NIO).
This was quit the challenge, because of the blocking InputStreams.
I found a way, which makes it possible and i want to post it, to help people with similiar problems.
Reading a byte array of dynamic sice is done here with the DataInputStream, which kann be simply wrapped around the socketInputStream. Also, i do not want to introduce a specific communication protocoll (like first sending the size of bytes, that will be send), because i want to make this as vanilla as possible. First of, i have a simple utility Buffer class, which looks like this:
import java.util.ArrayList;
import java.util.List;
public class Buffer {
private byte[] core;
private int capacity;
public Buffer(int size){
this.capacity = size;
clear();
}
public List<Byte> list() {
final List<Byte> result = new ArrayList<>();
for(byte b : core) {
result.add(b);
}
return result;
}
public void reallocate(int capacity) {
this.capacity = capacity;
}
public void teardown() {
this.core = null;
}
public void clear() {
core = new byte[capacity];
}
public byte[] array() {
return core;
}
}
This class only exists, because of the dumb way, byte <=> Byte autoboxing in Java works with this List. This is not realy needed at all in this example, but i did not want to leave something out of this explanation.
Next up, the 2 simple, core methods. In those, a StringBuilder is used as a "callback". It will be filled with the result which has been read and the amount of bytes read will be returned. This might be done different of course.
private int readNext(StringBuilder stringBuilder, Buffer buffer) throws IOException {
// Attempt to read up to the buffers size
int read = in.read(buffer.array());
// If EOF is reached (-1 read)
// we disconnect, because the
// other end disconnected.
if(read == -1) {
disconnect();
return -1;
}
// Add the read byte[] as
// a String to the stringBuilder.
stringBuilder.append(new String(buffer.array()).trim());
buffer.clear();
return read;
}
private Optional<String> readBlocking() throws IOException {
final Buffer buffer = new Buffer(256);
final StringBuilder stringBuilder = new StringBuilder();
// This call blocks. Therefor
// if we continue past this point
// we WILL have some sort of
// result. This might be -1, which
// means, EOF (disconnect.)
if(readNext(stringBuilder, buffer) == -1) {
return Optional.empty();
}
while(in.available() > 0) {
buffer.reallocate(in.available());
if(readNext(stringBuilder, buffer) == -1) {
return Optional.empty();
}
}
buffer.teardown();
return Optional.of(stringBuilder.toString());
}
The first method readNext will fill the buffer, with byte[] from the DataInputStream and return the amount bytes read this way.
In the secon method, readBlocking, i utilized the blocking nature, not to worry about consumer-producer-problems. Simply readBlocking will block, untill a new byte-array is received. Before we call this blocking method, we allocate a Buffer-size. Note, i called reallocate after the first read (inside the while loop). This is not needed. You can safely delete this line and the code will still work. I did it, because of the uniqueness of my problem.
The 2 things, i did not explain in more detail are:
1. in (the DataInputStream and the only short varaible here, sorry for that)
2. disconnect (your disconnect routine)
All in all, you can now use it, this way:
// The in has to be an attribute, or an parameter to the readBlocking method
DataInputStream in = new DataInputStream(socket.getInputStream());
final Optional<String> rawDataOptional = readBlocking();
rawDataOptional.ifPresent(string -> threadPool.execute(() -> handle(string)));
This will provide you with a way of reading byte arrays of any shape or form over a socket (or any InputStream realy). Hope this helps!