Using FileWriter and keeping LineBreaks/WhiteSpace

Using FileWriter and keeping LineBreaks/WhiteSpace - java

Hello I'm currently using the Java FileWriter system with a HashTable to recreate a decoded message. I'm attempting to recreate a decoded message by looping through a file that contains all the encoded key value pairs (10111 = C) and writing to a text file
My Encoded KV Pairs
https://github.com/DijonLee/Project2/blob/master/freq.txt
prop.load(new FileReader(freqFile));
for (Map.Entry entry : prop.entrySet()) {
// map.put((String) entry.getKey(), (String) entry.getValue());
}
BufferedReader in = new BufferedReader(new FileReader(freqFile));
String line;
while ((line = in.readLine()) != null) {
// System.out.println(line); // ensure my line breaks are okay
if (line.contains("=")) {
String[] strings = line.split("=");
map.put(strings[0], (strings[1]));
}
}
Here is where I attempt to "Decode" my text based on the KV Pairs
FileReader inputStream = null;
inputStream = new FileReader("encoded.txt");
FileWriter outputStream = null;
outputStream = new FileWriter("ur_dec.txt");
int c;
String decoder = "";
String key = "";
while ((c = inputStream.read()) != -1) { // loop through file
char cToChar = (char) c; // get char
decoder += cToChar; // build string
if (map.containsValue(decoder)) {
for (Map.Entry entry : map.entrySet()) {
if (decoder.equals(entry.getValue())) {
key = (String) entry.getKey();
decoder = "";
outputStream.write(key); //
}
break; // breaking because its one to one map
}
}
}
Although my program works it appears to strip tabs and other white spaces that I'd like to keep while I encode and decode it and I'm not too sure why

Related

Unable to set character encoding in java.util.Scanner

I use Apache Tika to get encoding of file.
FileInputStream fis = new FileInputStream(my_file);
final AutoDetectReader detector = new AutoDetectReader(fis);
fis.close();
System.out.println("Encoding:" + detector.getCharset().toString());
I use Scanner to read values from file.
Scanner scanner = new Scanner(my_file, detector.getCharset().toString());
Map<String, String> values = new HashMap<>();
String line, key = null, value = null;
while (scanner.hasNextLine()) {
line = scanner.nextLine();
if (line.contains(":")) {
if (key != null) {
values.put(key, value.trim());
key = null;
value = null;
}
int indexOfColon = line.indexOf(":");
key = line.substring(0, indexOfColon);
value = line.substring(indexOfColon + 1);
} else {
value += " " + line;
}
}
Scanner is unable to read text from files with encoding windows-1252, I get empty string.
UPDATE 2018.11.07.
I have same problem in case of BufferedReader.
Map<String, String> values = new HashMap<>();
String line, key = null, value = null;
FileInputStream is = new FileInputStream(my_file);
InputStreamReader isr = new InputStreamReader(is, getEncoding(my_file));
BufferedReader buffReader = new BufferedReader(isr);
while (buffReader.readLine() != null) {
line = buffReader.readLine();
if (line.contains(":")) {
if (key != null) {
values.put(key, value.trim());
key = null;
value = null;
}
int indexOfColon = line.indexOf(":");
key = line.substring(0, indexOfColon);
value = line.substring(indexOfColon + 1);
} else {
value += " " + line;
}
}

Instead of reading lines, I would try reading characters instead using the following approach:
ByteArrayOutputStream line = new ByteArrayOutputStream();
Scanner scanner = new Scanner(my_file);
while (scanner.hasNextInt()) {
int c = 0;
// read every line
while (c != newline) { // TODO: Check for a newline char
c = scanner.nextInt();
line.write((byte) c);
}
byte[] array = line.toByteArray();
String output = new String(array, "Windows-1252"); // This should do the trick
// We have a string here, do your logic
line.reset();
}
This approach is ugly, but uses new String which has the ability to specify a specific encoding. I did not test or run this code at all, but at least it will show you if any content is actually read properly.

Converting into a string each of the lines retrieved from reading a text file

How can convert into a string each of the lines retrieved from reading a text file. For instance:
RandomAccessFile file = new RandomAccessFile("C:text.txt", "r");
FileChannel channel = file.getChannel();
System.out.println("Size: " + channel.size());
ByteBuffer buffer = ByteBuffer.allocate((int) channel.size());
channel.read(buffer);
buffer.flip();//Restore buffer to position 0 to read it
System.out.println("Read ... ");
for (int i = 0; i < channel.size(); i++) {
System.out.print((char) buffer.get());
}
I tried to add the following inside the for loop to get each line each time in "stringValueOf" but instead it displays each caracter separatly and not each line.
String stringValueOf = String.valueOf((char) buffer.get());

InputStreamReader is used to read characters. Java provides BufferedReader to make reading lines easy.
The following snippet reads lines from file and prints to standard output.
public void printFileLinesToStdOut(File file, Charset charSet)
{
try(BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file), charSet)))
{
String line = null;
while((line = reader.readLine()) != null)
{
System.out.println(line);
}
}
catch(IOException e)
{
// TODO : your code here
}
}

Java: reading utf-8 file page by page using FileInputStream

I need some code that will allow me to read one page at a time from a UTF-8 file.
I've used the code;
File fileDir = new File("DIRECTORY OF FILE");
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(fileDir), "UTF8"));
String str;
while ((str = in.readLine()) != null) {
System.out.println(str);
}
in.close();
}
After surrounding it with a try catch block it runs but outputs the entire file!
Is there a way to amend this code to just display ONE PAGE of text at a time?
The file is in UTF-8 format and after viewing it in notepad++, i can see the file contains FF characters to denote the next page.

You will need to look for the form feed character by comparing to 0x0C.
For example:
char c = in.read();
while ( c != -1 ) {
if ( c == 0x0C ) {
// form feed
} else {
// handle displayable character
}
c = in.read();
}
EDIT added an example of using a Scanner, as suggested by Boris
Scanner s = new Scanner(new File("a.txt")).useDelimiter("\u000C");
while ( s.hasNext() ) {
String str = s.next();
System.out.println( str );
}

If the file is valid UTF-8, that is, the pages are split by U+00FF, aka (char) 0xFF, aka "\u00FF", 'ÿ', then a buffered reader can do. If it is a byte 0xFF there would be a problem, as UTF-8 may use a byte 0xFF.
int soughtPageno = ...; // Counted from 0
int currentPageno = 0;
try (BufferedReader in = new BufferedReader(new InputStreamReader(
new FileInputStream(fileDir), StandardCharsets.UTF_8))) {
String str;
while ((str = in.readLine()) != null && currentPageno <= soughtPageno) {
for (int pos = str.indexOf('\u00FF'; pos >= 0; )) {
if (currentPageno == soughtPageno) {
System.out.println(str.substring(0, pos);
++currentPageno;
break;
}
++currentPageno;
str = str.substring(pos + 1);
}
if (currentPageno == soughtPageno) {
System.out.println(str);
}
}
}
For a byte 0xFF (wrong, hacked UTF-8) use a wrapping InputStream between FileInputStream and the reader:
class PageInputStream implements InputStream {
InputStream in;
int pageno = 0;
boolean eof = false;
PageInputSTream(InputStream in, int pageno) {
this.in = in;
this.pageno = pageno;
}
int read() throws IOException {
if (eof) {
return -1;
}
while (pageno > 0) {
int c = in.read();
if (c == 0xFF) {
--pageno;
} else if (c == -1) {
eof = true;
in.close();
return -1;
}
}
int c = in.read();
if (c == 0xFF) {
c = -1;
eof = true;
in.close();
}
return c;
}
Take this as an example, a bit more work is to be done.

You can use a Regex to detect form-feed (page break) characters. Try something like this:
File fileDir = new File("DIRECTORY OF FILE");
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(fileDir), "UTF8"));
String str;
Regex pageBreak = new Regex("(^.*)(\f)(.*$)")
while ((str = in.readLine()) != null) {
Match match = pageBreak.Match(str);
bool pageBreakFound = match.Success;
if(pageBreakFound){
String textBeforeLineBreak = match.Groups[1].Value;
//Group[2] will contain the form feed character
//Group[3] will contain the text after the form feed character
//Do whatever logic you want now that you know you hit a page boundary
}
System.out.println(str);
}
in.close();
The parenthesis around portions of the Regex denote capture groups, which get recorded in the Match object. The \f matches on the form feed character.
Edited Apologies, for some reason I read C# instead of Java, but the core concept is the same. Here's the Regex documentation for Java: http://docs.oracle.com/javase/tutorial/essential/regex/

Take Strings from Text file and assign each line to value (2 at a time and insert into LinkedHashMap)

What I'm trying to do is, load a Text file, then take the values from each line and assign them to a variable in my program. Every two lines, I will insert them into a LinkedHashMap (As a pair)
The problem with a buffered reader is, all I can seem to do is, read one line at a time.
Here is my current code:
public static void receiver(String firstArg) {// Receives
// Input
// File
String cipherText;
String key;
String inFile = new File(firstArg).getAbsolutePath();
Path file = new File(inFile).toPath();
// File in = new File(inFile);
try (InputStream in = Files.newInputStream(file);
BufferedReader reader = new BufferedReader(
new InputStreamReader(in))) {
String line = null;
while ((line = reader.readLine()) != null) {
// System.out.println(line);
String[] arrayLine = line.split("\n"); // here you are
// splitting
// with whitespace
cipherText = arrayLine[0];
// key = arrayLine[1];
System.out.println(arrayLine[0] + " " + arrayLine[1]);
cipherKeyPairs.put(arrayLine[0], arrayLine[1]);
}
} catch (IOException x) {
System.err.println(x);
}
The problem is, it can't find the arrayLine[1] (for obvious reasons). I need it to read two lines at a time without the array going out of bounds.
Any idea how to do this, so that I can store them into my LinkedHashMap, two lines at a time as separate values.

You can overcome this issue by inserting in the List every 2 lines reading.
A description for this code is that: "Bold is the true case"
Read the first line (count is 0)
If (secondLine is false) ==> Save the line to CipherText variable, make secondLine = true
Else If (secondLine is true) ==> Add to list (CipherText, line), make secondLine = false
Read the second line (count is 1)
If (secondLine is false) ==> Save the line to CipherText variable, make secondLine = true
Else If (secondLine is true) ==> Add to list (CipherText, line), make secondLine = false
String cipherText;
boolean secondLine = false;
String inFile = new File(firstArg).getAbsolutePath();
Path file = new File(inFile).toPath();
try {
InputStream in = Files.newInputStream(file);
BufferedReader reader = new BufferedReader(new InputStreamReader(in))) {
String line = null;
while ((line = reader.readLine()) != null) {
if (!secondLine) //first line reading
{
cipherText = line;
secondLine = true;
}
else if (secondLine) //second line reading
{
cipherKeyPairs.put(cipherText, line);
secondLine = false;
}
}
} catch (IOException x) {
System.err.println(x);
}

See if this works for you. I just edited your code. it might not be the best answer.
public static void receiver(String firstArg) {// Receives
// Input
// File
String cipherText;
String key;
String inFile = new File(firstArg).getAbsolutePath();
Path file = new File(inFile).toPath();
// File in = new File(inFile);
try (InputStream in = Files.newInputStream(file);
BufferedReader reader = new BufferedReader(
new InputStreamReader(in))) {
String line = null;
List<String> lines = new ArrayList();
while ((line = reader.readLine()) != null) {
lines.add(line);//trim line first though and check for empty string
}
for(int i=1;i<lines.size();i++){
cipherText = arrayLine[i];
// key = arrayLine[1];
System.out.println(arrayLine[i] + " " + arrayLine[i-1]);
cipherKeyPairs.put(arrayLine[i-1], arrayLine[i]);
}
} catch (IOException x) {
System.err.println(x);
}
}

how to loop and generate hash key for every word in file

May i know how do i loop so i can generate hash code for all the words in the file (.txt) ?
i'm already able to generate single hash code for the file.
the given loop reads and gets out the words in the text document.
but i'm unable to loop the hashkey generation in.
public static void main(String[] args) throws NoSuchAlgorithmException, IOException {
JFileChooser chooser=new JFileChooser();
int returnVal = chooser.showOpenDialog(null);
if (returnVal == JFileChooser.APPROVE_OPTION) {
File f = chooser.getSelectedFile();
}
FileInputStream fin = new FileInputStream(chooser.getSelectedFile());
DataInputStream din = new DataInputStream(fin);
BufferedReader br = new BufferedReader(new InputStreamReader(din));
ArrayList<String> list = new ArrayList<String> ();
MessageDigest md = MessageDigest.getInstance("MD5");
String currentLine;
byte[] buf = new byte[8192];
int len = 0;
while ((currentLine = br.readLine()) != null) {
list.add(currentLine);
md.update(buf, 0, len);
System.out.println(currentLine);
}
br.close();
byte[] bytes = md.digest();
StringBuilder sb = new StringBuilder(2 * bytes.length);
for (byte b : bytes) {
sb.append("0123456789ABCDEF".charAt((b & 0xF0) >> 4));
sb.append("0123456789ABCDEF".charAt((b & 0x0F)));
}
String hex = sb.toString();
System.out.println (buf);
System.out.println(sb);
}

On high follow below steps.
Read line by line.
Once you get line split it on \\s+(space).
Now you have all words in array and then iterate it .
For each string (word) call word.hashCode()

try using a tokenizer like this :
StreamTokenizer tokenizer = new StreamTokenizer(new FileReader("yourFilePath.txt"));
tokenizer.eolIsSignificant(false);
int token = tokenizer.nextToken();
while (token != StreamTokenizer.TT_EOF) {
if (token == StreamTokenizer.TT_WORD) {
System.err.println(tokenizer.sval.hashCode()); // here use any hash method you like
}
token = tokenizer.nextToken();
}

You generate a list of all the lines in the file which you then never seem to use. Maybe you should generate a list of all the words in the file by splitting each line on whitespace;
for (String word : currentLine.split("\\s+")) {
list.add(word);
}
Then you can use this list to create a list of hashes for each word;
List<byte[]> hashes = new ArrayList<byte[]>(list.size());
for (String word : list) {
md.reset();
hashes.add(md.digest(word));
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using FileWriter and keeping LineBreaks/WhiteSpace - java

Related

Unable to set character encoding in java.util.Scanner

Converting into a string each of the lines retrieved from reading a text file

Java: reading utf-8 file page by page using FileInputStream

Take Strings from Text file and assign each line to value (2 at a time and insert into LinkedHashMap)

how to loop and generate hash key for every word in file

Categories

Resources