Why additional characters appear when String in Java to HTML file? - java

I've looked through "similar" questions but wasn't able to get an answer to mine. Please point me to one if it already exists.
Problem: when saving String/StringBuilder to HTML format it adds additional characters at the beginning of the page and I can't figure out why. Example:
’tX<!DOCTYPE html>
<html>
method:
public void saveToHTML(){
String fileName = "";
if (docName != null){
fileName += docName;
} else {
fileName += stdFileName;
}
fileName += "HTML.html";
String tempText = new String("<!DOCTYPE html>\n<html>\n\t<body>");
int tabCount = 3;
for (int oneSec = 0; oneSec < allSections.size(); oneSec++){
for (int onePar = 0; onePar < allSections.get(oneSec).getCountParagraphs(); onePar++){
tempText += (convertParToHTML(allSections.get(oneSec).getParagraph(onePar),
tabCount));
}
}
tempText += ("\n\t</body>\n</html>");
serializeDoc(fileName, tempText.toString());
}
serializeDoc() below:
/**
* Helper method to serialize files
*
* #param fileName name of the file to be saved with
* #param object object to be saved in the file
* #throws IOException
*/
private void serializeDoc(String fileName, Object object){
try {
FileOutputStream file = new FileOutputStream(fileName);
ObjectOutputStream out = new ObjectOutputStream(file);
out.writeObject(object);
out.close();
} catch (IOException e){
System.out.println("The file couldn't be created");
}
}

You haven't posted serializeDoc so we really can't say. But this I will tell you: You really need to keep track of your charset with text files. Outputting the same text in ascii, latin-1, utf-8, utf-16, etc will give you different file sizes and different results. The best way to ensure conformity is to use FileWriters and FileReaders, where you can declare the charset type
-- update --
Yikes yikes and YIKES! You do NOT want to use object serialization here. This is going to save your java objects directly to the file, making the file harder to read and manually adjust. Writing the bytes to the FileOutputStream would be better, but like I said the best solution would be FileWriter so that you can specify the charset to save in.

Related

Java FileWriter not writing to file

I'm creating an app which requires to write a file for every user (in JSON format).
The app successfully creates the file, But it's empty. In the code I added a finer output to see how the converted JSON String looks like, and I noticed that it's complete (it contains everything, so the conversion is ok). But the string isn't present in the file.
//create a new FileWriter C:/.../42.guser
FileWriter writer = null;
//for every User
for(int i=0; i<users.size(); i++) {
try {
File f = new File(Users.usersDir.getPath()+"/"+i+".guser");
//Create new file
f.createNewFile();
writer = new FileWriter(f);
//convert to json and write to file. Here we get the object with KEY = keys[i].
String stringedUsr = gson.toJson(users.get(keys[i]));
logger.finer("Converted user: \""+stringedUsr+"\""); //Output seems ok
//Write (NOT WORKING)
writer.write(stringedUsr);
logger.fine("Wrote updated user \""+keys[i]+"\" to file "+f.getCanonicalPath());
} catch (IOException e) {
FAILS++;
e.printStackTrace();
}
}
//End of for
What am I doing wrong?
Note: i gave to the app all the necessary Permissions. No AccessControlExceptions
First of all think about closing your writer.
When closing, it should flush() your data first (as mentioned in the doc here)
You can also flush() manually.

Read bytes from file Java

I'm trying to parse my file which keeps all data in binary form. How to read N bytes from file with offset M? And then I need to convert it to String using new String(myByteArray, "UTF-8");. Thanks!
Here's some code:
File file = new File("my_file.txt");
byte [] myByteArray = new byte [file.lenght];
UPD 1: The answers I see are not appropriative. My file keeps strings in byte form, for example: when I put string "str" in my file it actually prints smth like [B#6e0b... in my file. Thus I need to get from this byte-code my string "str" again.
UPD 2: As it's found out the problem appears when I use toString():
PrintWriter writer = new PrintWriter(new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(System.getProperty("db.file")), true), "UTF-8")));
Iterator it = storage.entrySet().iterator();//storage is a map<String, String>
while (it.hasNext()){
Map.Entry pairs = (Map.Entry)it.next();
String K = new String(pairs.getKey().toString());
String V = new String(pairs.getValue().toString);
writer.println(K.length() + " " + K.getBytes() + " " + V.length() + " " + V.getBytes());//this is just the format I need to have in file
it.remove();
}
May be there're some different ways to perform that?
As of Java 7, reading the whole of a file really easy - just use Files.readAllBytes(path). For example:
Path path = Paths.get("my_file.txt");
byte[] data = Files.readAllBytes(path);
If you need to do this more manually, you should use a FileInputStream - your code so far allocates an array, but doesn't read anything from the file.
To read just a portion of a file, you should look at using RandomAccessFile, which allows you to seek to wherever you want. Be aware that the read(byte[]) method does not guarantee to read all the requested data in one go, however. You should loop until either you've read everything you need, or use readFully instead. For example:
public static byte[] readPortion(File file, int offset, int length)
throws IOException {
byte[] data = new byte[length];
try (RandomAccessFile raf = new RandomAccessFile(file)) {
raf.seek(offset);
raf.readFully(data);
}
return data;
}
EDIT: Your update talks about seeing text such as [B#6e0b... That suggests you're calling toString() on a byte[] at some point. Don't do that. Instead, you should use new String(data, StandardCharsets.UTF_8) or something similar - picking the appropriate encoding, of course.

Case study: Is this an effective way to split a file?

So om working my way trough a task in my java-course at school. For better understanding of what the code is supposed to do ill quote it:
"(Split files) Suppose you want to back up a huge file(e.g., a 10-GB AVI file) to a CD-R. You can achieve it by splitting the file into smaller pieces and backing up these pieces separately. Write a utility program that splits a large file into smaller ones using the following command: java ClassName SourceFile numberOfPieces
The command creates the files SourceFile.1, SourceFile2...etc
Now to be clear. This post is in no way an attempt to get a "solution" for the problem. I have solved it (with what i know). And i merely want to get more enlightned on some matters that crossed my mind when writing the code.
Is it neccesary to create a new output for every single file im
copying to? Doesn`t this demand unneccesary system power?
The first file that gets copied(SourceFile is in this case a .png
file) is possible to view. And show a fraction of the original
picture. (If i split into two. i can view half the picture.) But
the latter ones i cant view.. Why is that?
Is it possible to reassemble the splitted files in any way? if my
pictures was split into two files, can i put them back together and
view the whole picture?
The code, if you want to look at it.
All feedback is welcome,
Have a good day! :)
package oblig2;
import java.io.*;
import java.util.*;
public class Test {
/**
* Main method
*
* #param args[0] for source file
* #param args[1] for number of pieces
* #throws IOException
*/
public static void main(String[] args) throws IOException {
// The program needs to be executed with two parameters in order to
// work. This sentence check for it.
if (args.length != 2) {
System.out.println("Usage: java Copy sourceFile numberOfPieces");
System.exit(1);
}
// Check whether or not the sourcefile exists
File sourceFile = new File(args[0]);
if (!sourceFile.exists()) {
System.out.println("Source file " + args[0] + " does not exist");
System.exit(2);
}
// Need an Array to store all the new files that is supposed to contain
// parts of the original file
ArrayList<File> fileArray = new ArrayList<File>();
// All the new files need their own output(or do they?)
ArrayList<BufferedOutputStream> outputArray = new ArrayList<BufferedOutputStream>();
// Using randomAccessFile on the sourcefile to easier read parts of it
RandomAccessFile inOutSourceFile = new RandomAccessFile(sourceFile,
"rw");
// This loop changes the name for the new files, so they match the
// sourcefile with an appended digit
for (int i = 0; i < Integer.parseInt(args[1]); i++) {
String nameAppender = String.valueOf(i);
String nameBuilder;
int suffix = args[0].indexOf(".");
nameBuilder = args[0].substring(0, suffix);
fileArray.add((new File(nameBuilder + nameAppender + ".dat")));
}
// Here i create the output needed for all the new files
for (int i = 0; i < Integer.parseInt(args[1]); i++) {
outputArray.add(new BufferedOutputStream(new FileOutputStream(
new File(fileArray.get(i).getAbsolutePath()))));
}
// Now i determine in how many parts the sourcefile needs to be split,
// and the size of each.
float size = inOutSourceFile.length();
double parts = Integer.parseInt(args[1]);
double partSize = size / parts;
int r, numberOfBytesCopied = 0;
// This loop actually does the job of copying the parts into the new
// files
for (int i = 1; i <= parts; i++) {
while (inOutSourceFile.getFilePointer() < partSize * i) {
r = inOutSourceFile.readByte();
outputArray.get(i - 1).write((byte) r);
numberOfBytesCopied++;
}
}
// Here i close the input and outputs
inOutSourceFile.close();
for (int i = 0; i < parts; i++) {
outputArray.get(i).close();
}
// Display the operations
System.out.println(args[0] + " Has been split into " + args[1]
+ " pieces. " + "\n" + "Each file containig " + partSize
+ " Bytes each.");
}
}
Of course it is necessary to open all output files. But you don't have to open them at all times. You can open the first file, write to it, close it, open the second file, write to it, close it, etc.
File format, .png for example, have a structure that have to follow. It may have special header, and may have special footer. That's why when this file split into two or more, the first will lose its footer, the middle will lose its header and footer, and the last will lose it's header. This make them unusable as individual file.
Of course it is possible. By combining back all the parts, the original file fill be restructured.

check if the file is of a certain type

I want to validate if all the files in a directory are of a certain type. What I did so far is.
private static final String[] IMAGE_EXTS = { "jpg", "jpeg" };
private void validateFolderPath(String folderPath, final String[] ext) {
File dir = new File(folderPath);
int totalFiles = dir.listFiles().length;
// Filter the files with JPEG or JPG extensions.
File[] matchingFiles = dir.listFiles(new FileFilter() {
public boolean accept(File pathname) {
return pathname.getName().endsWith(ext[0])
|| pathname.getName().endsWith(ext[1]);
}
});
// Check if all the files have JPEG or JPG extensions
// Terminate if validation fails.
if (matchingFiles.length != totalFiles) {
System.out.println("All the tiles should be of type " + ext[0]
+ " or " + ext[1]);
System.exit(0);
} else {
return;
}
}
This works fine if the file name have an extension like {file.jpeg, file.jpg}
This fails if the files have no extensions {file1 file2}.
When I do the following in my terminal I get:
$ file folder/file1
folder/file1: JPEG image data, JFIF standard 1.01
Update 1:
I tried to get the magic numbers of the file to check if it is JPEG:
for (int i = 0; i < totalFiles; i++) {
DataInputStream input = new DataInputStream(
new BufferedInputStream(new FileInputStream(
dir.listFiles()[i])));
if (input.readInt() == 0xffd8ffe0) {
isJPEGFlag = true;
} else {
isJPEGFlag = false;
try {
input.close();
} catch (IOException ignore) {
}
System.out.println("File not JPEG");
System.exit(0);
}
}
I ran into another problem. There are some .DS_Store files in my folder.
Any idea how to ignore them ?
Firstly, file extensions are not mandatory, a file without extension could very well be a valid JPEG file.
Check the RFC for JPEG format, the file formats generally start with some fixed sequence of bytes to identify the format of the file. This is definitely not straight forward, but I am not sure if there is a better way.
In a nutshell you have to open each file, read first n bytes depending on file format, check if they match to file format you expect. If they do, its a valid JPEG file even if it has an exe extension or even if it does not have any extension.
For JPEGs you can do the magic number check in header of the file:
static bool HasJpegHeader(string filename)
{
using (BinaryReader br = new BinaryReader(File.Open(filename, FileMode.Open)))
{
UInt16 soi = br.ReadUInt16();
UInt16 jfif = br.ReadUInt16();
return soi == 0xd8ff && jfif == 0xe0ff;
}
}
More complete method here which covers EXIFF as well: C# How can I test a file is a jpeg?
One good (though expensive) check for validity as an image understood by J2SE is to try to ImageIO.read(File) it. That methods throws some quite helpful exceptions if it does not find an image in the file provided.

How do i get a filename of a file inside a gzip in java?

int BUFFER_SIZE = 4096;
byte[] buffer = new byte[BUFFER_SIZE];
InputStream input = new GZIPInputStream(new FileInputStream("a_gunzipped_file.gz"));
OutputStream output = new FileOutputStream("current_output_name");
int n = input.read(buffer, 0, BUFFER_SIZE);
while (n >= 0) {
output.write(buffer, 0, n);
n = input.read(buffer, 0, BUFFER_SIZE);
}
}catch(IOException e){
System.out.println("error: \n\t" + e.getMessage());
}
Using the above code I can succesfully extract a gzip's contents although the extracted file's filenames are, as expected, will always be current_output_name (I know its because I declared it to be that way in the code). My problem is I dont know how to get the file's filename when it is still inside the archive.
Though, java.util.zip provides a ZipEntry, I couldn't use it on gzip files.
Any alternatives?
as i kinda agree with "Michael Borgwardt" on his reply, but it is not entirely true, gzip file specifications contains an optional file name stored in the header of the gz file, sadly there are no way (as far as i know ) of getting that name in current java (1.6). as seen in the implementation of the GZIPInputStream in the method getHeader in the openjdk
they skip reading the file name
// Skip optional file name
if ((flg & FNAME) == FNAME) {
while (readUByte(in) != 0) ;
}
i have modified the class GZIPInputStream to get the optional filename out of the gzip archive(im not sure if i am allowed to do that) (download the original version from here), you only need to add a member String filename; to the class, and modify the above code to be :
// Skip optional file name
if ((flg & FNAME) == FNAME) {
filename= "";
int _byte = 0;
while ((_byte= readUByte(in)) != 0){
filename += (char)_byte;
}
}
and it worked for me.
Apache Commons Compress offers two options for obtaining the filename:
With metadata (Java 7+ sample code)
try ( //
GzipCompressorInputStream gcis = //
new GzipCompressorInputStream( //
new FileInputStream("a_gunzipped_file.gz") //
) //
) {
String filename = gcis.getMetaData().getFilename();
}
With "the convention"
String filename = GzipUtils.getUnCompressedFilename("a_gunzipped_file.gz");
References
Apache Commons Compress
GzipCompressorInputStream
See also: GzipUtils#getUnCompressedFilename
Actually, the GZIP file format, using the multiple members, allows the original filename to be specified. Including a member with the FLAG of FLAG.FNAME the name can be specified. I do not see a way to do this in the java libraries though.
http://www.gzip.org/zlib/rfc-gzip.html#specification
following the answers above, here is an example that creates a file "myTest.csv.gz" that contains a file "myTest.csv", notice that you can't change the internal file name, and you can't add more files into the gz file.
#Test
public void gzipFileName() throws Exception {
File workingFile = new File( "target", "myTest.csv.gz" );
GZIPOutputStream gzipOutputStream = new GZIPOutputStream( new FileOutputStream( workingFile ) );
PrintWriter writer = new PrintWriter( gzipOutputStream );
writer.println("hello,line,1");
writer.println("hello,line,2");
writer.close();
}
Gzip is purely compression. There is no archive, it's just the file's data, compressed.
The convention is for gzip to append .gz to the filename, and for gunzip to remove that extension. So, logfile.txt becomes logfile.txt.gz when compressed, and again logfile.txt when it's decompressed. If you rename the file, the name information is lost.

Categories

Resources