Transferring Emojis from Spreadsheet to Java - java

I would like to transfer data from a google sheet (or any spreadsheet) into java.
How emoji shows up in google sheets: "That picture 😍😍🔥🔥🔥"
How emoji shows up in downloaded TSV: "That picture "ðŸ˜ðŸ˜ðŸ”¥ðŸ”¥ðŸ”¥"
I have trouble understanding how I should be dealing with Emojis:
Is the following correct? I believe the way that emojis behave is that what I see in that first image is the HTML version of the emoji, and that there is an escaped version that looks something like \uD383\u2823
How do I proceed to transfer emojis into java:
What I want to do is be able to count the number of different emojis, so I need to separate them based on their codes.

So it seemed I was freaking out for no reason and should've just gone straight into Java hands first instead of thinking about encodings:
I downloaded my spreadsheet file into TSV
I parsed the TSV file using a regular BufferedReader and used
import org.apache.commons.lang3.StringEscapeUtils;
`BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader(filename));
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
try {
while ((line = reader.readLine()) != null) {
System.out.println(StringEscapeUtils.escapeJava(line));
}
}`
3. output: \u00F0\u0178\u201D\u00A5\u00F0\u0178\u201D\u00A5 for input 🔥🔥

Related

How to get file content properly from a jpg file?

I'm trying to get content from a jpg file so I can encrypt that content and save it in another file that is later decrypted.
I'm trying to do so by reading the jpg file as if it were a text file with this code:
String aBuffer = "";
try {
File myFile = new File(pathRoot);
FileInputStream fIn = new FileInputStream(myFile);
BufferedReader myReader = new BufferedReader(new InputStreamReader(fIn));
String aDataRow = "";
while ((aDataRow = myReader.readLine()) != null) {
aBuffer += aDataRow;
}
myReader.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
But this doesn't give the content the file has, just a short string and weirdly enough it also looks like just reading the file corrupts it.
What could I do so I can achieve the desired behavior?
Image files aren't text - but you're treating the data as textual data. Basically, don't do that. Use the InputStream to load the data into a byte array (or preferably, use Files.readAllBytes(Path) to do it rather more simply).
Keep the binary data as binary data. If you absolutely need a text representation, you'll need to encode it in a way that doesn't lose data - where hex or base64 are the most common ways of doing that.
You mention encryption early in the question: encryption also generally operates on binary data. Any encryption methods which provide text options (e.g. string parameters) are just convenience wrappers which encode the text as binary data and then encrypt it.
and weirdly enough it also looks like just reading the file corrupts it.
I believe you're mistaken about that. Just reading from a file will not change it in any way. Admittedly you're not using try-with-resources statements, so you could end up keeping the file handle open, potentially preventing another process from reading it - but the content of the file itself won't change.

OpenCSV reads strange text out of file

I am using AndroidStudio and my applications has to read in a CSV file which looks like this:
"Anmeldung";"1576017126809898";"1547126680978123";"";"";"Frau"
"Anmeldung";"1547126680911112";"1547126680978123";"";"";"Frau"
But as you can see in the following picture, OpenCSV reads some strange character and in my List there are senseless Strings which are not in the file it read
This is how I read the Data out of my file:
try {
FileReader filereader = new FileReader(filePath);
CSVParser parser = new CSVParserBuilder().withSeparator(';').build();
CSVReader csvReader = new CSVReaderBuilder(filereader)
.withSkipLines(1)
.withCSVParser(parser)
.build();
List<String[]> allData = csvReader.readAll();
MainActivity.setAllData(allData);
}
catch (Exception e) {
e.printStackTrace();
}
Thank you
It looks like there is an encoding poblem.
Make sure to open and parse the file with the proper encoding (for example utf-8 or utf-16). Same for viewing the data.
I figured it out. It might sound strange but I took the file and replaced all ; with ;
I think the data I got were exportet with an UTF-16 Encoding or from an linux device.
tl;dr The File had the wrong encoding. And the way I opened and viewed it was correct

Reading a large compressed file using Apache Commons Compress

I'm trying to read a bz2 file using Apache Commons Compress.
The following code works for a small file.
However for a large file (over 500MB), it ends after reading a few thousands lines without any error.
try {
InputStream fin = new FileInputStream("/data/file.bz2");
BufferedInputStream bis = new BufferedInputStream(fin);
CompressorInputStream input = new CompressorStreamFactory()
.createCompressorInputStream(bis);
BufferedReader br = new BufferedReader(new InputStreamReader(input,
"UTF-8"));
String line = "";
while ((line = br.readLine()) != null) {
System.out.println(line);
}
} catch (Exception e) {
e.printStackTrace();
}
Is there another good way to read a large compressed file?
I was having the same problem with a large file, until I noticed that CompressorStreamFactory has a couple of overloaded constructors that take a boolean decompressUntilEOF parameter.
Simply changing to the following may be all that's missing...
CompressorInputStream input = new CompressorStreamFactory(true)
.createCompressorInputStream(bis);
Clearly, whoever wrote this factory seems to think it's better to create new compressor input streams at certain points, with the same underlying buffered input stream so that the new one picks up where the last one left off. They seem to think that's a better default, or preferred way of doing it over allowing one stream to decompress data all the way to the end of the file. I've no doubt they are cleverer than me, and I haven't worked out what trap I'm setting for future me by setting this parameter to true. Maybe someone will tell me in the comments! :-)

Java: easiest way to read from an Excel-style document?

I'm trying to find the best way to read in data from a file similar to an Excel document. It doesn't necessarily need to be an actual excel document, just any file that allows you to enter data in a grid format.
Something where I would be able to do manipulation similar to this:
String val = file.readString(column,row);
float val2 = file.readFloat(column,row);
I'm sorry, I usually try to do more research before I post a question here but I was having a hard time finding much info. A lot of what I saw was 3rd party libraries that read excel files. I'm really hoping if possible I can avoid downloading libraries and hopefully use built in ones.
So I guess my questions in short are:
What's the most appropriate file format for this?
What's the best way to read data from that file?
The first thing that comes to my mind is CSV. CSV files are just regular text files with the .csv filename extension. Data is stored in this format:
cell,anothercell,athirdcell
anotherrow,anothercellonthenewrow,thirdcellofsecondrow
For more specifics, read the CSV specs here.
Option 1
Store your data in a CSV and read with any kind of reader (e.g. BufferedReader). This might be the easiest and fastest solution, if you want to use Excel/LibreOffice for entering data.
Please check out the answers in these threads for various solutions.
String csvfile = path;
BufferedReader br = null;
String line = "";
String cvsSplitby = ";";
try {
br = new BufferedReader(new FileReader(csvfile));
while ((line = br.readLine()) != null) {
String[] i = line.split(cvsSplitby);
// do stuff
}
} catch (all kind of exceptions e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Hope I didn't miss anything important.
Option 2
Use POI Apache.
Option 3
I've made some decent experience with JXL, but I understand that you don'T want to include too many external libs. (I just saw that it hasn't been updated in while. Consider the other options!)

BufferedReader, read chars in an edittext gives strange chars

Ok, I am reading a .docx file via a BufferedReader and want to store the text in an edittext. The .docx is not in english language but in a different one (greek). I use:
File file = new File(file_Path);
try {
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
StringBuilder text = new StringBuilder();
while ((line = br.readLine()) != null) {
text.append(line);
}
et1.setText(text);
And the result I get is this:
If the characters are in english language, it works fine. But in my case they aren't. How can I fix this? Thanks a lot
Ok, I am reading a .docx file via a BufferedReader
Well that's the first problem. BufferedReader is for plain text files. docx files are binary files in a specific format (assuming you mean the kind of file that Microsoft Word saves). You can't just read them like text files. Open the file up in Notepad (not Wordpad) and you'll see what what I mean.
You might want to look at Apache POI.
From comments:
Testing to read a .txt file with the same text gave same results too
That's probably due to using the wrong encoding. FileReader always uses the platform default encoding, which is annoying. Assuming you're using Java 7 or higher, you'd be better off with Files.newBufferedReader:
try (BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8)) {
...
}
Adjust the charset to match the one you used when saving your text file, of course - if you have the option of using UTF-8, that's a pretty good choice. (Aside from anything else, pretty much everything can handle UTF-8.)

Categories

Resources