https://github.com/essiembre/eclipse-rbe/issues/83
eclipse-rbe
In Eclipse exist file: messagesBundle_ru_RU.properties
#Generated by ResourceBundle Editor (http://essiembre.github.io/eclipse-rbe/)
#Generated by ResourceBundle Editor (http://eclipse-rbe.sourceforge.net)
#Created by JInto - www.guh-software.de
#Sun Nov 18 17:19:12 EET 2012
ABS = \u0410\u0411\u0421
About = \u041E \u043F\u0440\u043E\u0433\u0440\u0430\u043C\u043C\u0435
Add = \u0414\u043E\u0431\u0430\u0432\u043B\u0435\u043D\u0438\u0435 \u0437\u0430\u043F\u0438\u0441\u0438
Add_Condition = \u0414\u043E\u0431\u0430\u0432\u0438\u0442\u044C \u0443\u0441\u043B\u043E\u0432\u0438\u0435
Additional = \u0414\u043E\u043F\u043E\u043B\u043D\u0438\u0442\u0435\u043B\u044C\u043D\u043E
How I can convert this to READABLE text in UTF-8 format?
This is a properties file, and when it was saved to an OutputStream, any character outside of the ISO-8859-1 character set was replaced with a Unicode escape. The Properties.load(InputStream) method will decode this for you. You can then save the properties to a new file, specifying UTF-8 encoding.
static void transcodeProperties(Path src, Path dst) throws IOException {
Properties properties = new Properties();
try (InputStream fis = Files.newInputStream(src);
BufferedInputStream is = new BufferedInputStream(fis)) {
properties.load(is);
}
try (Writer w = Files.newBufferedWriter(dst, StandardCharsets.UTF_8, StandardOpenOption.CREATE_NEW)) {
properties.store(w, null);
}
}
I created a method of doing what you asked. Keep in mind, I used methods from Apache. I extracted the required methods that way you will not be forced to use that library if you do not want to.
public static void translateFileUnicode(File input, File output) {
LinkedList<String> result = new LinkedList<>();
try {
BufferedReader reader = new BufferedReader(new FileReader(input));
String temp = reader.readLine();
while (temp != null) {
result.add(translate(temp));
temp = reader.readLine();
}
reader.close();
BufferedWriter writer = new BufferedWriter(new FileWriter(output));
for (String str : result) {
writer.write(str + '\n');
}
writer.flush();
writer.close();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
public static String translate(final CharSequence input) {
if (input == null) {
return null;
}
try {
final StringWriter writer = new StringWriter(input.length() * 2);
translate(input, writer);
return writer.toString();
} catch (final IOException ioe) {
throw new UncheckedIOException(ioe);
}
}
public static void translate(final CharSequence input, final Writer writer) throws IOException {
Objects.requireNonNull(writer, "writer");
if (input == null) {
return;
}
int pos = 0;
final int len = input.length();
while (pos < len) {
final int consumed = translateUnicode(input, pos, writer);
if (consumed == 0) {
final char c1 = input.charAt(pos);
writer.write(c1);
pos++;
if (Character.isHighSurrogate(c1) && pos < len) {
final char c2 = input.charAt(pos);
if (Character.isLowSurrogate(c2)) {
writer.write(c2);
pos++;
}
}
continue;
}
for (int pt = 0; pt < consumed; pt++) {
pos += Character.charCount(Character.codePointAt(input, pos));
}
}
}
public static int translateUnicode(final CharSequence input, final int index, final Writer out) throws IOException {
if (input.charAt(index) == '\\' && index + 1 < input.length() && input.charAt(index + 1) == 'u') {
int i = 2;
while (index + i < input.length() && input.charAt(index + i) == 'u') {
i++;
}
if (index + i < input.length() && input.charAt(index + i) == '+') {
i++;
}
if (index + i + 4 <= input.length()) {
final CharSequence unicode = input.subSequence(index + i, index + i + 4);
try {
final int value = Integer.parseInt(unicode.toString(), 16);
out.write((char) value);
} catch (final NumberFormatException nfe) {
throw new IllegalArgumentException("Unable to parse unicode value: " + unicode, nfe);
}
return i + 4;
}
throw new IllegalArgumentException("Less than 4 hex digits in unicode value: '" + input.subSequence(index, input.length())
+ "' due to end of CharSequence");
}
return 0;
}
If you wish to translate the unicode characters from one file into another all at once you can use the translateFileUnicode(File, File) method. If you wish to translate a single String you can use the translate(CharSequence) method. I hope this is what you were looking for.
I have two methods, both using FileInputStream Objects.
The First one returns expected value. This method works fine.
But the Second method returns nothing. The value passed to the second method is not null.
I need to get the hexadecimal format of the files passed to methods.
Why is it so? Kindly Explain.
Here is my code
public String binaryFile1(File file1){
try{
stringBuilder1=new StringBuilder();
is1=new FileInputStream(file1);
while(b!=-1){
counter++;
b=is1.read();
String s = Integer.toHexString(b).toUpperCase();
if (s.length() == 1) {
stringBuilder1.append('0');
}
if(counter%5==0){
stringBuilder1.append(s).append("\n");
counter=0;
}else
stringBuilder1.append(s).append(' ');
}
is1.close();
}catch(Exception e){
e.printStackTrace();
}
return stringBuilder1.toString();
}
public String binaryFile2(File file2){
try{
stringBuilder2=new StringBuilder();
is2=new FileInputStream(file2);
while(b!=-1){
counter++;
b=is2.read(); //Here b does not get any content assigned.
String s = Integer.toHexString(b).toUpperCase();
if (s.length() == 1) {
stringBuilder2.append('0');
}
if(counter%5==0){
stringBuilder2.append(s).append("\n");
counter=0;
}else
stringBuilder2.append(s).append(' ');
}
is2.close();
}catch(Exception e){
e.printStackTrace();
}
return stringBuilder2.toString(); //Here stringBuilder2 is null
}
Since b is shared and you don't reset it after binaryFile1 it's still -1 at the start of binaryFile2. I suggest you use,
int b;
while ((b = is2.read()) != -1) {
// ...
}
Edit
It is important to close your resources when you're done. I also suggest you try and limit variable scope as much as possible. Using try-with-resources you could write binaryFile2 like
public String binaryFile2(File file) {
StringBuilder sb = new StringBuilder();
int counter = 0;
try (InputStream is = new FileInputStream(file)) {
int b;
while ((b = is.read()) != -1) {
counter++;
String s = Integer.toHexString(b).toUpperCase();
if (s.length() == 1) {
sb.append('0');
}
sb.append(s);
if (counter % 5 == 0) {
sb.append(System.lineSeparator());
counter = 0;
} else {
sb.append(' ');
}
}
} catch (Exception e) {
e.printStackTrace();
}
return sb.toString();
}
I'm having some problems with an InputStream. I'm writing a little Android application and part of it has to fetch HTML code from a website. Generally, it works fine, but sometimes (usually the second time it's called, but it may also take a few tries to reproduce this) it will just skip the InputStream (I noticed this since it takes a few seconds while debugging, but every time it fails it will just immediately skip to the next line). Any ideas what could be causing this and how to fix it?
private class fetchdata extends AsyncTask<Void, Void, Void> {
public Activity activity;
public fetchdata(Activity a)
{
activity = a;
}
protected Void doInBackground(Void...voids)
{
String[] page = new String[16384]; //Number is just for testing, don't worry
try {
page = executeHttpGet();
} catch (Exception e) {
page[0] = "Error";
}
displayFetchedData(page);
return null;
}
public String[] executeHttpGet() throws Exception {
URL u;
InputStream is = null;
DataInputStream dis = null;
String s;
int i = 0;
int hostselection;
boolean skip;
String[] page = new String[16384];
String[] serverurls = new String[2];
addSecurityException();
SharedPreferences dataprefs = getSharedPreferences("serverdata", Context.MODE_PRIVATE);
hostselection = dataprefs.getInt("selectedhost", 0);
SharedPreferences preferences;
preferences = PreferenceManager.getDefaultSharedPreferences(activity);
serverurls[0] = preferences.getString("server01", "");
serverurls[1] = preferences.getString("server02", "");
for (int j = 0; j < 2; j++)
{
skip = false;
if (j == 0)
{
if (hostselection == 0 || hostselection == 1)
{
Authenticator.setDefault(new MyAuthenticator(activity, false));
}
else
{
skip = true;
}
}
if (j == 1)
{
if (hostselection == 0 || hostselection == 2)
{
Authenticator.setDefault(new MyAuthenticator(activity, true));
}
else
{
skip = true;
}
}
if (skip == false)
{
try {
u = new URL(serverurls[j]);
is = u.openStream(); //LINE IN QUESTION
dis = new DataInputStream(new BufferedInputStream(is));
while ((s = dis.readLine()) != null)
{
if (s.length() > 18)
{
page[i] = s;
i++;
}
}
}
catch (IOException ioe)
{
ioe.printStackTrace();
}
is.close();
}
}
return page;
}
Create a BufferedInputStream out of the input stream you get, then Call mark() method with the input stream length as parameter. Call reset() when you need to reuse the stream next time.
Unrelated, but you aren't closing the DataInputStream.
Tell us more about the skipping. Is an exception raised? Is it possible that when you are running it outside of debug mode it is somehow referencing stale class files? The only thing I can imagine is that somehow your debug and normal classes are somehow different.
I wonder, if there exists any Java library, which could generate poi-data for Tomtom navigation devices (usually the file has .ov2 extension).
I use Tomtom makeov2.exe util from Tomtom, but it is not stable and it seems that not longer supported.
I wasn't able to find a library that does writing, although I did find this class to read .ov2 files:
package readers;
import java.io.FileInputStream;
import java.io.IOException;
public class OV2RecordReader {
public static String[] readOV2Record(FileInputStream inputStream){
String[] record = null;
int b = -1;
try{
if ((b = inputStream.read())> -1) {
// if it is a simple POI record
if (b == 2) {
record = new String[3];
long total = readLong(inputStream);
double longitude = (double) readLong(inputStream) / 100000.0;
double latitude = (double) readLong(inputStream) / 100000.0;
byte[] r = new byte[(int) total - 13];
inputStream.read(r);
record[0] = new String(r);
record[0] = record[0].substring(0,record[0].length()-1);
record[1] = Double.toString(latitude);
record[2] = Double.toString(longitude);
}
//if it is a deleted record
else if(b == 0){
byte[] r = new byte[9];
inputStream.read(r);
}
//if it is a skipper record
else if(b == 1){
byte[] r = new byte[20];
inputStream.read(r);
}
else{
throw new IOException("wrong record type");
}
}
else{
return null;
}
}
catch(IOException e){
e.printStackTrace();
}
return record;
}
private static long readLong(FileInputStream is){
long res = 0;
try{
res = is.read();
res += is.read() <<8;
res += is.read() <<16;
res += is.read() <<24;
}
catch(IOException e){
e.printStackTrace();
}
return res;
}
}
I also found this PHP code to write the file:
<?php
$csv = file("File.csv");
$nbcsv = count($csv);
$file = "POI.ov2";
$fp = fopen($file, "w");
for ($i = 0; $i < $nbcsv; $i++) {
$table = split(",", chop($csv[$i]));
$lon = $table[0];
$lat = $table[1];
$des = $table[2];
$TT = chr(0x02).pack("V",strlen($des)+14).pack("V",round($lon*100000)).pack("V",round($lat*100000)).$des.chr(0x00);
#fwrite($fp, "$TT");
}
fclose($fp);
I'm not sure how you'd go about writing a Java class (or extending the one above) to write the file like the PHP function does, but you may be able to get some insight into how the file is encoded from it.
I use huge data files, sometimes I only need to know the number of lines in these files, usually I open them up and read them line by line until I reach the end of the file
I was wondering if there is a smarter way to do that
This is the fastest version I have found so far, about 6 times faster than readLines. On a 150MB log file this takes 0.35 seconds, versus 2.40 seconds when using readLines(). Just for fun, linux' wc -l command takes 0.15 seconds.
public static int countLinesOld(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean empty = true;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
}
return (count == 0 && !empty) ? 1 : count;
} finally {
is.close();
}
}
EDIT, 9 1/2 years later: I have practically no java experience, but anyways I have tried to benchmark this code against the LineNumberReader solution below since it bothered me that nobody did it. It seems that especially for large files my solution is faster. Although it seems to take a few runs until the optimizer does a decent job. I've played a bit with the code, and have produced a new version that is consistently fastest:
public static int countLinesNew(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int readChars = is.read(c);
if (readChars == -1) {
// bail out if nothing to read
return 0;
}
// make it easy for the optimizer to tune this loop
int count = 0;
while (readChars == 1024) {
for (int i=0; i<1024;) {
if (c[i++] == '\n') {
++count;
}
}
readChars = is.read(c);
}
// count remaining characters
while (readChars != -1) {
for (int i=0; i<readChars; ++i) {
if (c[i] == '\n') {
++count;
}
}
readChars = is.read(c);
}
return count == 0 ? 1 : count;
} finally {
is.close();
}
}
Benchmark resuls for a 1.3GB text file, y axis in seconds. I've performed 100 runs with the same file, and measured each run with System.nanoTime(). You can see that countLinesOld has a few outliers, and countLinesNew has none and while it's only a bit faster, the difference is statistically significant. LineNumberReader is clearly slower.
I have implemented another solution to the problem, I found it more efficient in counting rows:
try
(
FileReader input = new FileReader("input.txt");
LineNumberReader count = new LineNumberReader(input);
)
{
while (count.skip(Long.MAX_VALUE) > 0)
{
// Loop just in case the file is > Long.MAX_VALUE or skip() decides to not read the entire file
}
result = count.getLineNumber() + 1; // +1 because line index starts at 0
}
The accepted answer has an off by one error for multi line files which don't end in newline. A one line file ending without a newline would return 1, but a two line file ending without a newline would return 1 too. Here's an implementation of the accepted solution which fixes this. The endsWithoutNewLine checks are wasteful for everything but the final read, but should be trivial time wise compared to the overall function.
public int count(String filename) throws IOException {
InputStream is = new BufferedInputStream(new FileInputStream(filename));
try {
byte[] c = new byte[1024];
int count = 0;
int readChars = 0;
boolean endsWithoutNewLine = false;
while ((readChars = is.read(c)) != -1) {
for (int i = 0; i < readChars; ++i) {
if (c[i] == '\n')
++count;
}
endsWithoutNewLine = (c[readChars - 1] != '\n');
}
if(endsWithoutNewLine) {
++count;
}
return count;
} finally {
is.close();
}
}
With java-8, you can use streams:
try (Stream<String> lines = Files.lines(path, Charset.defaultCharset())) {
long numOfLines = lines.count();
...
}
The answer with the method count() above gave me line miscounts if a file didn't have a newline at the end of the file - it failed to count the last line in the file.
This method works better for me:
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}
I tested the above methods for counting lines and here are my observations for Different methods as tested on my system
File Size : 1.6 Gb
Methods:
Using Scanner : 35s approx
Using BufferedReader : 5s approx
Using Java 8 : 5s approx
Using LineNumberReader : 5s approx
Moreover Java8 Approach seems quite handy :
Files.lines(Paths.get(filePath), Charset.defaultCharset()).count()
[Return type : long]
I know this is an old question, but the accepted solution didn't quite match what I needed it to do. So, I refined it to accept various line terminators (rather than just line feed) and to use a specified character encoding (rather than ISO-8859-n). All in one method (refactor as appropriate):
public static long getLinesCount(String fileName, String encodingName) throws IOException {
long linesCount = 0;
File file = new File(fileName);
FileInputStream fileIn = new FileInputStream(file);
try {
Charset encoding = Charset.forName(encodingName);
Reader fileReader = new InputStreamReader(fileIn, encoding);
int bufferSize = 4096;
Reader reader = new BufferedReader(fileReader, bufferSize);
char[] buffer = new char[bufferSize];
int prevChar = -1;
int readCount = reader.read(buffer);
while (readCount != -1) {
for (int i = 0; i < readCount; i++) {
int nextChar = buffer[i];
switch (nextChar) {
case '\r': {
// The current line is terminated by a carriage return or by a carriage return immediately followed by a line feed.
linesCount++;
break;
}
case '\n': {
if (prevChar == '\r') {
// The current line is terminated by a carriage return immediately followed by a line feed.
// The line has already been counted.
} else {
// The current line is terminated by a line feed.
linesCount++;
}
break;
}
}
prevChar = nextChar;
}
readCount = reader.read(buffer);
}
if (prevCh != -1) {
switch (prevCh) {
case '\r':
case '\n': {
// The last line is terminated by a line terminator.
// The last line has already been counted.
break;
}
default: {
// The last line is terminated by end-of-file.
linesCount++;
}
}
}
} finally {
fileIn.close();
}
return linesCount;
}
This solution is comparable in speed to the accepted solution, about 4% slower in my tests (though timing tests in Java are notoriously unreliable).
/**
* Count file rows.
*
* #param file file
* #return file row count
* #throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (Stream<String> lines = Files.lines(file.toPath())) {
return lines.count();
}
}
Tested on JDK8_u31. But indeed performance is slow compared to this method:
/**
* Count file rows.
*
* #param file file
* #return file row count
* #throws IOException
*/
public static long getLineCount(File file) throws IOException {
try (BufferedInputStream is = new BufferedInputStream(new FileInputStream(file), 1024)) {
byte[] c = new byte[1024];
boolean empty = true,
lastEmpty = false;
long count = 0;
int read;
while ((read = is.read(c)) != -1) {
for (int i = 0; i < read; i++) {
if (c[i] == '\n') {
count++;
lastEmpty = true;
} else if (lastEmpty) {
lastEmpty = false;
}
}
empty = false;
}
if (!empty) {
if (count == 0) {
count = 1;
} else if (!lastEmpty) {
count++;
}
}
return count;
}
}
Tested and very fast.
A straight-forward way using Scanner
static void lineCounter (String path) throws IOException {
int lineCount = 0, commentsCount = 0;
Scanner input = new Scanner(new File(path));
while (input.hasNextLine()) {
String data = input.nextLine();
if (data.startsWith("//")) commentsCount++;
lineCount++;
}
System.out.println("Line Count: " + lineCount + "\t Comments Count: " + commentsCount);
}
I concluded that wc -l:s method of counting newlines is fine but returns non-intuitive results on files where the last line doesn't end with a newline.
And #er.vikas solution based on LineNumberReader but adding one to the line count returned non-intuitive results on files where the last line does end with newline.
I therefore made an algo which handles as follows:
#Test
public void empty() throws IOException {
assertEquals(0, count(""));
}
#Test
public void singleNewline() throws IOException {
assertEquals(1, count("\n"));
}
#Test
public void dataWithoutNewline() throws IOException {
assertEquals(1, count("one"));
}
#Test
public void oneCompleteLine() throws IOException {
assertEquals(1, count("one\n"));
}
#Test
public void twoCompleteLines() throws IOException {
assertEquals(2, count("one\ntwo\n"));
}
#Test
public void twoLinesWithoutNewlineAtEnd() throws IOException {
assertEquals(2, count("one\ntwo"));
}
#Test
public void aFewLines() throws IOException {
assertEquals(5, count("one\ntwo\nthree\nfour\nfive\n"));
}
And it looks like this:
static long countLines(InputStream is) throws IOException {
try(LineNumberReader lnr = new LineNumberReader(new InputStreamReader(is))) {
char[] buf = new char[8192];
int n, previousN = -1;
//Read will return at least one byte, no need to buffer more
while((n = lnr.read(buf)) != -1) {
previousN = n;
}
int ln = lnr.getLineNumber();
if (previousN == -1) {
//No data read at all, i.e file was empty
return 0;
} else {
char lastChar = buf[previousN - 1];
if (lastChar == '\n' || lastChar == '\r') {
//Ending with newline, deduct one
return ln;
}
}
//normal case, return line number + 1
return ln + 1;
}
}
If you want intuitive results, you may use this. If you just want wc -l compatibility, simple use #er.vikas solution, but don't add one to the result and retry the skip:
try(LineNumberReader lnr = new LineNumberReader(new FileReader(new File("File1")))) {
while(lnr.skip(Long.MAX_VALUE) > 0){};
return lnr.getLineNumber();
}
How about using the Process class from within Java code? And then reading the output of the command.
Process p = Runtime.getRuntime().exec("wc -l " + yourfilename);
p.waitFor();
BufferedReader b = new BufferedReader(new InputStreamReader(p.getInputStream()));
String line = "";
int lineCount = 0;
while ((line = b.readLine()) != null) {
System.out.println(line);
lineCount = Integer.parseInt(line);
}
Need to try it though. Will post the results.
It seems that there are a few different approaches you can take with LineNumberReader.
I did this:
int lines = 0;
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
String line = count.readLine();
if(count.ready())
{
while(line != null) {
lines = count.getLineNumber();
line = count.readLine();
}
lines+=1;
}
count.close();
System.out.println(lines);
Even more simply, you can use the Java BufferedReader lines() Method to return a stream of the elements, and then use the Stream count() method to count all of the elements. Then simply add one to the output to get the number of rows in the text file.
As example:
FileReader input = new FileReader(fileLocation);
LineNumberReader count = new LineNumberReader(input);
int lines = (int)count.lines().count() + 1;
count.close();
System.out.println(lines);
This funny solution works really good actually!
public static int countLines(File input) throws IOException {
try (InputStream is = new FileInputStream(input)) {
int count = 1;
for (int aChar = 0; aChar != -1;aChar = is.read())
count += aChar == '\n' ? 1 : 0;
return count;
}
}
On Unix-based systems, use the wc command on the command-line.
Only way to know how many lines there are in file is to count them. You can of course create a metric from your data giving you an average length of one line and then get the file size and divide that with avg. length but that won't be accurate.
If you don't have any index structures, you'll not get around the reading of the complete file. But you can optimize it by avoiding to read it line by line and use a regex to match all line terminators.
Best Optimized code for multi line files having no newline('\n') character at EOF.
/**
*
* #param filename
* #return
* #throws IOException
*/
public static int countLines(String filename) throws IOException {
int count = 0;
boolean empty = true;
FileInputStream fis = null;
InputStream is = null;
try {
fis = new FileInputStream(filename);
is = new BufferedInputStream(fis);
byte[] c = new byte[1024];
int readChars = 0;
boolean isLine = false;
while ((readChars = is.read(c)) != -1) {
empty = false;
for (int i = 0; i < readChars; ++i) {
if ( c[i] == '\n' ) {
isLine = false;
++count;
}else if(!isLine && c[i] != '\n' && c[i] != '\r'){ //Case to handle line count where no New Line character present at EOF
isLine = true;
}
}
}
if(isLine){
++count;
}
}catch(IOException e){
e.printStackTrace();
}finally {
if(is != null){
is.close();
}
if(fis != null){
fis.close();
}
}
LOG.info("count: "+count);
return (count == 0 && !empty) ? 1 : count;
}
Scanner with regex:
public int getLineCount() {
Scanner fileScanner = null;
int lineCount = 0;
Pattern lineEndPattern = Pattern.compile("(?m)$");
try {
fileScanner = new Scanner(new File(filename)).useDelimiter(lineEndPattern);
while (fileScanner.hasNext()) {
fileScanner.next();
++lineCount;
}
}catch(FileNotFoundException e) {
e.printStackTrace();
return lineCount;
}
fileScanner.close();
return lineCount;
}
Haven't clocked it.
if you use this
public int countLines(String filename) throws IOException {
LineNumberReader reader = new LineNumberReader(new FileReader(filename));
int cnt = 0;
String lineRead = "";
while ((lineRead = reader.readLine()) != null) {}
cnt = reader.getLineNumber();
reader.close();
return cnt;
}
you cant run to big num rows, likes 100K rows, because return from reader.getLineNumber is int. you need long type of data to process maximum rows..