Split xml file in to multiple files using buffWrite or FileWriter - java

I have an xml file that has different types of medical data on it. I am trying to split the xml file in to multiple files by tag name or by row. I am getting this xml data from a csv file. The xml file data looks like this.
Xml File Data
CSV file that Im getting data from
Each file once split should be named descriptor_pdx01.xml, descriptor_pdx02.xml, descriptor_pdx03.xml ...... etc.
I am trying to accomplish this in Java using Buff Write or FileWriter. This is the code I wrote to write to the xml file. The problem is trying to figure out how to split the XML file in to multiple files in java using buffwrite and Filewriter
This is the content of the CSV FIle:
TESTCLIENT_2017_09_30 pdx01 Carcinosarcoma C34448 M 12
TESTCLIENT_2017_09_30 pdx02 Esophageal carcinoma C4025
TESTCLIENT_2017_09_30 pdx03 Esophageal carcinoma C4025 45
TESTCLIENT_2017_09_30 pdx04 Carcinosarcoma C34448 F
TESTCLIENT_2017_09_30 pdx05 Esophageal carcinoma C4025 F 60
TESTCLIENT_2017_09_30 pdx06 Esophageal carcinoma C4025 F 66
TESTCLIENT_2017_09_30 pdx07 Carcinosarcoma C34448 M 70
public static void writeXmlFile(String[] headers, ArrayList <String> stringsFromCsv, String fileName){
try {
BufferedWriter buffWrite = new BufferedWriter(new FileWriter(fileName));
buffWrite.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n");
buffWrite.write("<patient>\r\n");
//String array that is the same size is that of the string from csv from line 66-69
//For each string in the csv file
for(String s:stringsFromCsv){
buffWrite.write("\t<person>\r\n");
//Split the line into an array of strings
String fields[] = s.split(",");
String[] stringToWrite = fields;
//Initiate a string variable and using the field array values form the xml and assign it to the string variable
//DO a method call and Send the string and fileName (descriptor_field[1])
//For each item in that array of strings
for(int i=0; i<fields.length; i++){
//Define a String and keep on adding to that string using field element array, String should be outside for loopLol
//Write the corresponding header to the file, as well as the value from the array 'fields'
buffWrite.write("\t\t<" + headers[i] +">"+ fields[i] + "</" + headers[i] +">\n");
}
buffWrite.write("\t</person>\n");
}
buffWrite.write("</people>");
buffWrite.close();
}
catch (IOException ioe){ System.err.println("Error while writing to xml file in writeXmlFile: "); ioe.printStackTrace();
}
}
If you could offer any guidance on how to do this it would be greatly appreciated.

There are a lot of issues in your code as some people commented in your question.
I would use a lightweight library to handle both your CSV and XML files into a POJO.
My suggestion? BeanIO is a very good library to read/write CSV and XML files.
Write your POJO and use BeanIO's #Record and #Field annotations. BeanIO will gracefully parse both CSV / XML into your POJO's so you can do whatever you need.

Related

how to rewrite csv file in java without double quotes

original csv
reformatted csv
i have crawled data that i want from web and saved them as CSV as above.
However, when i tried to exclude the data that i do not want that to be included on my data set and tried to save it again, i get all those unwanted quotation mark(like """) through all columns on my data
How do i fix this problem??
(I want to make all the column names to have same format as above to have just "column name"(one quotation mark))
BufferedReader br = Files.newBufferedReader(Paths.get("FilePath"));
CSVWriter cw = new CSVWriter(new OutputStreamWriter(new FileOutputStream("FileName.csv", true), StandardCharsets.UTF_8));
String val="";
while((val=br.readLine())!=null){
if(!val.contains("some keywords")){
continue;
}
cw.writeNext(new String[]{val});
}
cw.flush();

how to write a string which starts with - into csv file?

I am trying to write data to CSV file.
The string value which starts with - is getting converted to #NAME? automatically when i open csv file after writing. e.g. If i write test it displays correctly but when i write -test the value would be #NAME? when i open csv file. It is not a code issue but csv file automatically changes the value which starts with - to error(#NAME?). How can i correct this programmatically. below is the code,
public class FileWriterTest {
public static void main(String[] args) {
BufferedWriter bufferedWriter = null;
File file = new File("test.csv");
try {
bufferedWriter = new BufferedWriter(new FileWriter(file));
List<String> records = getRecords();
for (String record : records) {
bufferedWriter.write(record);
bufferedWriter.newLine();
}
bufferedWriter.flush();
System.out.println("Completed writing data to a file.");
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (bufferedWriter != null)
bufferedWriter.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public static List<String> getRecords() {
List<String> al = new ArrayList<String>();
String s1 = "test";
String s2 = "-test";
al.add(s1);
al.add(s2);
return al;
}
}
Could you please assist?
It's a problem with excel. When you open a CSV file in excel it tries to determine cell type automatically which usually fails. The CSV file is alright the editor is not ;)
You can either right click on the field, select Format Cell and there make it a text file (and you might need to remove the automatically inserted '=' sign). Or you can open the CSV file by going into Data - From Text/CSV and in the wizard select the proper column types.
In the formal CSV standard, you can do this by using quotes (") around the field value. They're a text delimiter (as opposed to other kinds of values, like numeric ones).
It sounds like you're using Excel. You may need to enable " as a text delimiter/indicator.
Update: If you double-click the .csv to open it in Excel, even this doesn't work. You have to open a workbook and then import the CSV data into it. (Pathetic, really...)
I got a relatively old version of Excel (2007), and the following works perfectly:
Put the text between double quotes and preceed it with an equal sign.
I.e., -test becomes ="-test".
You file will therefore look like this:
test1,test2,test3
test4,="-test5",test6
UPDATE
Works in Excel-2010 as well.
As Veselin Davidov mentioned, this will break the csv standard but I don't know whether that's a problem.

Loop each line of CSV and remove up to the Nth comma in Java

I'm consuming a CSV that is generated by an external process. This CSV goes to different places and requires different columns to be included or excluded.
An example of the difference in files...
File 1:
Col1,Col2,Col3,Col4,Col5
ABC,DEF,GHI,JKL,MNO
File 2:
Col4,Col5
JKL,MNO
Pseudo:
1. Open the initial CSV file and create a new CSV file.
2. Loop through the CSV file and for each line copy the columns needed
3. Drop new file in new location
I'm stuck copying the right columns or just removing them. Is there an easy way to loop through each row and just remove data up to a certain comma?
Split the CSV by commas and take the columns you need.
In this demo I have only shown for one line of CSV but you can extend this program to handle multiple lines.
import java.util.*;
import java.lang.*;
import java.io.*;
{
public static void main (String[] args) throws java.lang.Exception
{
// Read a file into inputCsv
String inputCsv = "c0,c1,c2,c3";
String outputCsv = "";
int[] colsNeeded = {1,3};
String[] cols = inputCsv.split(",");
for(int i = 0; i < colsNeeded.length; i++){
outputCsv += cols[colsNeeded[i]];
if(i < colsNeeded.length - 1)
outputCsv += ",";
}
System.out.println(outputCsv);
// Write output Csv onto some file
}
}
Just use univocity-parsers for that:
String input = "Col1,Col2,Col3,Col4,Col5\n" +
"ABC,DEF,GHI,JKL,MNO\n";
Reader inputReader = new StringReader(input); //reading from your input string. Use FileReader for files
Writer outputWriter = new StringWriter(); //writing into another string. Use FileWriter for files.
CsvParserSettings parserSettings = new CsvParserSettings(); //configure the parser
parserSettings.selectFields("Col4", "Col5"); //select fields you need here
//For convenience, just use ready to use routines.
CsvRoutines routines = new CsvRoutines(parserSettings);
//call parse and write to read the selected columns and write them to the output
routines.parseAndWrite(inputReader, outputWriter);
//print the result
System.out.println(outputWriter);
Output:
Col4,Col5
JKL,MNO
Hope this helps.
Disclaimer: I'm the author of this library. It's open-source and free (Apache 2.0 license).

Japanese character not showing properly converting CSV file

I am converting CSV file from Tatoeba project. It contains Japanese characters. I am inserting data into SQLite database. Insertion is going without a problem, but characters are showing not properly.
If I insert directly:
String str = content_parts[2];
sentence.setValue(str);
Getting values like this:
ãã¿ã«ã¡ãã£ã¨ãããã®ããã£ã¦ãããã
I have tried to decode to UTF8 from JIS:
String str = content_parts[2];
byte[] utf8EncodedBytes = str.getBytes("JIS");
String s = new String(utf8EncodedBytes, "UTF-8");
sentence.setValue(s);
JIS:
$B!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!r!)!)!/!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)!r!)!)!)!)!)!)!)!)!)!)!)!)!)!)!)(B
Shift-JIS:
????\??????�N?�}??????????????????��?????�N?�N???��??????
Shift_JIS:
????\????????????????????????��?�N??????????????????��??????
CSV file (when opened by Excel 2010)
n きみにちょっとしたものをもってきたよ。
What I am doing wrong? How to solve this problem?
If you are still searching for solution, refer below link
setting-a-utf-8-in-java-and-csv-file and handle Japanese characters
csv-reports-not-displaying-japanese-characters
In brief, add BOM(byte order mark) characters to your file outputstream before passing it to outputstream writer.
String content="some string to write in file(in any language)";
FileOutputStream fos = new FileOutputStream("D:\csvFile.csv");
fos.write(239);
fos.write(187);
fos.write(191);
Writer w = new BufferedWriter(new OutputStreamWriter(fos, StandardCharsets.UTF_8));
w.write(content);
w.close();
Hope this will help

Reading from a file in Java

I wrote a program that reads from a text file using Java. The file has 1 column with a lot of integer values and each value is being added to an array list. However, when I print the array list, between each number I am getting an empty entry. For example if in text file I have:
4
55
I am getting:
1 : ÿþ4 (Also I do not know what this weird character is)
2 :
3 : 555
Code:
import java.io.*;
import java.util.Scanner;
import java.util.ArrayList;
public class ReadFile {
public static void main(String[] args) {
try
{
Scanner input = new Scanner("ReadingFile.txt");
File file = new File(input.nextLine());
input = new Scanner(file);
ArrayList numbers = new ArrayList();
int i=1;
while (input.hasNextLine()) {
String line = input.nextLine();;
numbers.add(line);
System.out.println(i + " : " + line);
i++;
}
input.close();
}
catch (Exception ex)
{
ex.printStackTrace();
}
}
}
I tried to avoid using the arraylist and just do :
System.out.println(i + " " + line);
however this problem is still there so I am guessing that it is not an ArrayList problem.
Provided your text file is actually a good text file, it could be a character encoding thing. You need to provide the correct character set to your scanner in its constructor. So change the line:
input = new Scanner(file);
Into something like:
String charset = "UTF-8";
input = new Scanner(file, charset);
Ofcourse, you need to figure out which character set your file is actually stored as and use that one. I do UTF-8 here only as an example.
OK, the problem is that you're actually reading binary from an excel file, hence the strange characters. If you want to read an excel file directly, then use a library such as JXL (http://jexcelapi.sourceforge.net/) - here's a good tutorial for using that API: http://www.vogella.com/tutorials/JavaExcel/article.html
Otherwise, you would want to save export your excel file to CSV format and read the file with your code.
weird chars should be writeUTF prefix or BOM. so, depends on how you write the file, reading method can be different.
if you write file with DataOutputStream and call writeUTF, then you should read the file with readUTF
if it is a simple text file that was written by a text program, like notepad++, I suggest call trim() function for every line.
Looks like your file is UTF-16. These two characters are the Byte order mark of UTF-16.
You must specify that when constructing your Scanner.
final Scanner scanner = new Scanner(file, "UTF-16");
If you don't have Notepad++ (text editor) download it. Open your generated text file using it.
Do find/Replace and populate the fields and check the settings by looking at the image below. then press Replace All. And then save your file. Your text file will be clean.

Categories

Resources