Writing korean characters to zipped csv file in Java - java
I'm writing a csv file that I also need to zip, and am using java.util.zip.ZipEntry and java.util.zip.ZipOutputStream.
It all works great when I have western characters in all the columns, but when I use Korean characters it fails to recognize the /n and everything appears messed up and on the same row. I'm writing it as UTF-8 characters, and expect this covers korean.
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
import org.joda.time.DateTime;
import org.joda.time.DateTimeZone;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;
public class CreateCSV {
public static void main(String[] args) throws IOException {
DateTime utcDateTime = new DateTime().toDateTime(DateTimeZone.UTC);
DateTime newDateTime = utcDateTime.toDateTime();
DateTimeFormatter dateFormatter = DateTimeFormat.forPattern("yyyyMMdd-HHmmss-SSS-");
File zipFile = new File("C:/TestCSVKorean/"+ dateFormatter.print(newDateTime) + "Export.zip");
FileOutputStream fileOutputStream = new FileOutputStream(zipFile);
// Open up the zipfile and create the csv entry
ZipOutputStream zos = new ZipOutputStream(new BufferedOutputStream(fileOutputStream));
zos.putNextEntry(new ZipEntry(dateFormatter.print(newDateTime) + "tics.csv"));
// The first line of the CSV is a header line
StringBuffer csvHeader = new StringBuffer(
"Time,Name,Rev,Appme,EnvName,PlanName,PlanRev,"
+ "Og Name,Op Name,"
+ "TTR,Lat,Bys Rec,Bytt,"
+ "RPl,Request Method,URI Path,Query String,HTTP Status Code,"
+ "HTTP Request Headers,User Agent,Request Body,HTTP Response Headers,Response Body\n");
zos.write(csvHeader.toString().getBytes(), 0, csvHeader.length());
StringBuffer csvData = new StringBuffer("");
csvData.append("\"" + newDateTime + "\",\"" +
"apiName" + "\"," +
"2.0.0" + ",\"" +
"app name" + "\",\"" +
"env name" + "\",\"" +
"plan name" + "\"," +
"2" + ",\"" +
"dev org name" + "\",\"" +
"ìº˜ë¦°ë” ëª©ë¡ì¡°íšŒ(ë‚´ 캘린ë”, 구ë…가능한 캘린ë”, 시스템 캘린ë”, ê´€ë¦¬ìž ìº˜ë¦°ë”)" + "\"," +
"123" + "," +
"inifd: 334;dshs: 343" + ", " +
"10" + "," +
"33" + ",\"" +
"http" + "\",\"" +
"GET" + "\",\"" +
"/dsfs/sdf/ds" + "\",\"" + "query string" + "\",\"" +
"200" + "\",\"" +
"jshkshdf" + "\",\"" +
"sdjhfks/sdfs/" + "\",\"" +
"jhksdfhks dsfs" + "\",\"" +
"dsfsdfs" + "\",\"" +
"dsfsfs" + "\"\n");
zos.write(csvData.toString().getBytes("UTF-8"), 0, csvData.length());
csvData = new StringBuffer("");
csvData.append("\"" + newDateTime + "\",\"" +
"apiName" + "\"," +
"2.0.0" + ",\"" +
"app name" + "\",\"" +
"env name" + "\",\"" +
"plan name" + "\"," +
"2" + ",\"" +
"dev org name" + "\",\"" +
"ìº˜ë¦°ë” ëª©ë¡ì¡°íšŒ(ë‚´ 캘린ë”, 구ë…가능한 캘린ë”, 시스템 캘린ë”, ê´€ë¦¬ìž ìº˜ë¦°ë”)" + "\"," +
"123" + "," +
"inifd: 334;dshs: 343" + ", " +
"10" + "," +
"33" + ",\"" +
"http" + "\",\"" +
"GET" + "\",\"" +
"/dsfs/sdf/ds" + "\",\"" + "query string" + "\",\"" +
"200" + "\",\"" +
"jshkshdf" + "\",\"" +
"sdjhfks/sdfs/" + "\",\"" +
"jhksdfhks dsfs" + "\",\"" +
"dsfsdfs" + "\",\"" +
"dsfsfs" + "\"\n");
zos.write(csvData.toString().getBytes("UTF-8"), 0, csvData.length());
zos.close();
}
}
This is what I see when I open the csv file:
Time Name Rev Appme EnvName PlanName PlanRev Og Name Op Name TTR Lat Bys Rec Bytt RPl Request Method URI Path Query String HTTP Status Code HTTP Request Headers User Agent Request Body HTTP Response Headers Response Body
2016-01-28T17:20:56.859Z apiName 2.0.0 app name env name plan name 2 dev org name 캘린ë†목ë¡Âì¡°ÃÅ¡Å’(ë‚´ 캘린ëÂâ€, 구ëÂ…가능Õœ 캘린ëÂâ€, 시스Ã…œ 캘린ëÂâ€, 관리잠캘린ëÂâ€) 123 inifd: 334;dshs: 343 10 33 http 2016-01-28T17:20:56.859Z apiName 2.0.0 app name env name plan name 2 dev org name 캘린ë†목ë¡Âì¡°ÃÅ¡Å’(ë‚´ 캘린ëÂâ€, 구ëÂ…가능Õœ 캘린ëÂâ€, 시스Ã…œ 캘린ëÂâ€, 관리잠캘린ëÂâ€) 123 inifd: 334;dshs: 343 10 33 http
It is sticking the date field from the second row into the Request method field of the first: 2016-01-28T17:20:56.859Z
First, you should get out of the habit of using StringBuffer. It's an obsolete class. If you need to append text little by little, you would normally use StringBuilder instead.
In your case, however, you don't need StringBuilder or StringBuffer. Just use the string:
String csvHeader =
"Time,Name,Rev,Appme,EnvName,PlanName,PlanRev,"
+ "Og Name,Op Name,"
+ "TTR,Lat,Bys Rec,Bytt,"
+ "RPl,Request Method,URI Path,Query String,HTTP Status Code,"
+ "HTTP Request Headers,User Agent,Request Body,HTTP Response Headers,Response Body\n";
And…
String csvData = "\"" + newDateTime + "\",\"" +
"apiName" + "\"," +
"2.0.0" + ",\"" +
"app name" + "\",\"" +
// etc.
Second, be careful not to confuse byte count with character count. When you convert a String to bytes using the UTF-8 charset, any characters not in the US-ASCII range (0-127) will be converted to more than one byte. Therefore, the number of bytes will be larger than the String's length (which represents how many characters it contains, not how many bytes it takes up when encoded in UTF-8).
So your write operation should just be:
zos.write(csvData.toString().getBytes("UTF-8"));
Third, I don't know Korean, but I know what Hangul characters look like, and I don't see any in your code. I assume you intended these to be Hangul:
"ìº˜ë¦°ë” ëª©ë¡ì¡°íšŒ(ë‚´ 캘린ë”, 구ë…가능한 캘린ë”, 시스템 캘린ë”, ê´€ë¦¬ìž ìº˜ë¦°ë”)" + "\"," +
It appears you're using Windows to place each individual UTF-8 byte in your String as if it were a character. But in Java, bytes are not characters, and are not interchangeable with characters.
I assume your use of Windows, because the third character, the spacing Unicode character SMALL TILDE, is \u02dc which would normally take up two bytes, but in the windows-1252 encoding, it is the single byte 0x98.
So, if I assume you derived those characters from the UTF-8 bytes of Hangul characters, the first six bytes in the above string are:
ec ba 98 eb a6 b0
ì º ˜ ë ¦ °
Those bytes are the UTF-8 representation of the two Hangul characters U+CE98 and U+B9B0. The correct way to place those two characters in a Java string is:
"\uce98\ub9b0"
If you have the original Hangul text in a file, you can easily convert the entire text to a series of Java escape sequences like the above line, using the native2ascii tool that comes with every JDK. Such a command might look like:
native2ascii -encoding UTF-8 hangul.txt hangulstrings.java
An alternative approach which I don't recommend, if you don't want to be bothered to write your Strings correctly, is to force your current "pseudo-bytes" string to be interpreted as UTF-8 bytes by recognizing that it contains Windows-1252 characters representing bytes and restoring it to those bytes:
zos.write(csvData.getBytes("windows-1252"));
The resulting zip entry will still be encoded in UTF-8, since your bytes are a UTF-8 representation of your Hangul text. So you need to make sure you open the file using a tool that recognizes that the file is UTF-8.
Windows is not especially good at recognizing a UTF-8 file. Notepad is especially poor at it. One way to signal to Windows that a file is a UTF-8 file is to write a Byte Order Mark character as the first character in the file:
String csvHeader = "\ufeff"
+ "Time,Name,Rev,Appme,EnvName,PlanName,PlanRev,"
// etc.
Related
Encoding special characters in wiremock response
I am facing a scenario where the character "�" occasionally gets returned from my okhttp requests, and the character is causing some downstream issues. So I have added code to remove this character should it exist and I would like to add a test case to ensure this works correctly. The issue is that wiremock does not seem to like this special character. Normally I would pull out the data from the response as so: String stringifiedResponse = response.getResponseString(); if (response.isSuccessful()) { custResp = response.getData(); Normally this works fine for all my requests. However, when I set up wiremock to return a response with the special character (even as a single one, and I would like to test with many different fields), the stringified response does have the response but the data is null. This is how I have set up the mocks in my test class public static void mockCPInvalidChars(String ssn) { String customerPrefillPrimaryOwnerRequest = " {\n" + " \"customers\": [\n" + " {\n" + " \"partyId\": \"" + ssn + "\",\n" + " \"idType\": \"LID\"\n" + " }\n" + " ]\n" + "}"; String partyId = ssn.substring(0, 3) + "-" + ssn.substring(3, 5) + "-" + ssn.substring(5, 9); String customerPrefillPrimaryOwnerResponse = "{\"totalRecords\":1,\"customers\":[{\"partyId\":\"" + partyId + "\",\"idType\":\"LID\",\"sourceCode\":\"ICS\",\"firstName\":\"R�EEVES\",\"lastName\":\"WI�CKLIFF\",\"address1\":\"59 Ma�iling LANE\",\"address2\":\"ma�l2\",\"address3\":\"mail�3\",\"address4\":\"mai�l4\",\"city\":\"Mai�l\",\"state\":\"M�A\",\"zipCode\":\"010�10\",\"primaryPhone\":\"817504�0350\",\"alternatePhone\":\"81750�40351\",\"birthDate\":\"1902-0�2-10\",\"foreignIndicator\":\"N\",\"alternateAddress1\":\"88 LEG�AL LANE\",\"alternateAddress2\":\"leg�al2\",\"alternateAddress3\":\"lega�l3\",\"alternateAddress4\":\"lega�l4\",\"alternateCity\":\"LEG�AL\",\"alternateZipCode\":\"020�20\",\"alternateState\":\"L�A\",\"alternateForeignIndicator\":\"N\",\"mailTo\":\"\",\"alternateMailTo\":\"\",\"institutionId\":\"N\",\"taxId\":\"" + ssn + "\",\"taxIdIssuer\":\"S\"}]}"; stubFor(post(urlEqualTo("/my/url")) .withRequestBody(equalToJson(customerPrefillPrimaryOwnerRequest)) .withHeader("Authorization", equalTo("Bearer " + OauthService.getOauthToken().orElse(new OauthToken()).getAccess_token())) .willReturn(aResponse() .withStatus(200) .withHeader("Content-Type", "text/xml") .withHeader("Content-Length", String.valueOf(customerPrefillPrimaryOwnerResponse.length())) .withBody(customerPrefillPrimaryOwnerResponse))); }
Your issue is coming from not escaping the replacement character. � translates to \uFFFD, but you'll need to escape the escape character as well (silly JSON), so that becomes \\uFFFD, or in the middle of another string "na\\uFFFDthan"
Parse html content for a value
I receive a Http response after a call as Html String and I would like to scrape certain value stored inside the ReportViewer1 variable. <html> .................... ........... <script type="text/javascript"> var ReportViewer1 = new ReportViewer('ReportViewer1', 'ReportViewer1_ReportToolbar', 'ReportViewer1_ReportArea_WaitControl', 'ReportViewer1_ReportArea_ReportCell', 'ReportViewer1_ReportArea_PreviewFrame', 'ReportViewer1_ParametersAreaCell', 'ReportViewer1_ReportArea_ErrorControl', 'ReportViewer1_ReportArea_ErrorLabel', 'ReportViewer1_CP', '/app/Telerik.ReportViewer.axd', 'a90a0d41efa6429eadfefa42fc529de1', 'Percent', '100', '', 'ReportViewer1_EditorPlaceholder', 'ReportViewer1_CalendarFrame', 'ReportViewer1_ReportArea_DocumentMapCell', { CurrentPageToolTip: 'STR_TELERIK_MSG_CUR_PAGE_TOOL_TIP', ExportButtonText: 'Export', ExportToolTip: 'Export', ExportSelectFormatText: 'Export to the selected format', FirstPageToolTip: 'First page', LabelOf: 'of', LastPageToolTip: 'Last Page', ProcessingReportMessage: 'Generating report...', NoPageToDisplay: 'No page to display.', NextPageToolTip: 'Next page', ParametersToolTip: 'Click to close parameters area|Click to open parameters area', DocumentMapToolTip: 'Hide document map|Show document map', PreviousPageToolTip: 'Previous page', TogglePageLayoutToolTip: 'Switch to interactive view|Switch to print preview', SessionHasExpiredError: 'Session has expired.', SessionHasExpiredMessage: 'Please, refresh the page.', PrintToolTip: 'Print', RefreshToolTip: 'Refresh', NavigateBackToolTip: 'Navigate back', NavigateForwardToolTip: 'Navigate forward', ReportParametersSelectAllText: '<select all>', ReportParametersSelectAValueText: '<select a value>', ReportParametersInvalidValueText: 'Invalid value.', ReportParametersNoValueText: 'Value required.', ReportParametersNullText: 'NULL', ReportParametersPreviewButtonText: 'Preview', ReportParametersFalseValueLabel: 'False', ReportParametersInputDataError: 'Missing or invalid parameter value. Please input valid data for all parameters.', ReportParametersTrueValueLabel: 'True', MissingReportSource: 'The source of the report definition has not been specified.', ZoomToPageWidth: 'Page Width', ZoomToWholePage: 'Full Page' }, 'ReportViewer1_ReportArea_ReportArea', 'ReportViewer1_ReportArea_SplitterCell', 'ReportViewer1_ReportArea_DocumentMapCell', true, true, 'PDF', 'ReportViewer1_RSID', true); </script> ................... ................... </html> The value is a90a0d41efa6429eadfefa42fc529de1 and this is in the middle of this content: '/app/Telerik.ReportViewer.axd', 'a90a0d41efa6429eadfefa42fc529de1', 'Percent', '100', Whats the best way I can parse this value using Java?
Parse the HTML with String class public class HtmlParser { public static void main(String args[]){ String result = getValuesProp(html); System.out.println("Result: "+ result); } static String PIVOT = "Telerik.ReportViewer.axd"; public static String getValuesProp(String json) { String subString; int i = json.indexOf(PIVOT); i+= PIVOT.length(); //', chars i+=2; subString = json.substring(i); i = subString.indexOf("'"); i++; subString = subString.substring(i); i = subString.indexOf("'"); subString = subString.substring(0,i); return subString; } static String html ="<html>\n" + "\n" + "<script type=\"text/javascript\">\n" + " var ReportViewer1 = new ReportViewer('ReportViewer1', 'ReportViewer1_ReportToolbar', 'ReportViewer1_ReportArea_WaitControl', 'ReportViewer1_ReportArea_ReportCell', 'ReportViewer1_ReportArea_PreviewFrame', 'ReportViewer1_ParametersAreaCell', 'ReportViewer1_ReportArea_ErrorControl', 'ReportViewer1_ReportArea_ErrorLabel', 'ReportViewer1_CP', '/app/Telerik.ReportViewer.axd', 'a90a0d41efa6429eadfefa42fc529de1', 'Percent', '100', '', 'ReportViewer1_EditorPlaceholder', 'ReportViewer1_CalendarFrame', 'ReportViewer1_ReportArea_DocumentMapCell', {\n" + " CurrentPageToolTip: 'STR_TELERIK_MSG_CUR_PAGE_TOOL_TIP',\n" + " ExportButtonText: 'Export',\n" + " ExportToolTip: 'Export',\n" + " ExportSelectFormatText: 'Export to the selected format',\n" + " FirstPageToolTip: 'First page',\n" + " LabelOf: 'of',\n" + " LastPageToolTip: 'Last Page',\n" + " ProcessingReportMessage: 'Generating report...',\n" + " NoPageToDisplay: 'No page to display.',\n" + " NextPageToolTip: 'Next page',\n" + " ParametersToolTip: 'Click to close parameters area|Click to open parameters area',\n" + " DocumentMapToolTip: 'Hide document map|Show document map',\n" + " PreviousPageToolTip: 'Previous page',\n" + " TogglePageLayoutToolTip: 'Switch to interactive view|Switch to print preview',\n" + " SessionHasExpiredError: 'Session has expired.',\n" + " SessionHasExpiredMessage: 'Please, refresh the page.',\n" + " PrintToolTip: 'Print',\n" + " RefreshToolTip: 'Refresh',\n" + " NavigateBackToolTip: 'Navigate back',\n" + " NavigateForwardToolTip: 'Navigate forward',\n" + " ReportParametersSelectAllText: '<select all>',\n" + " ReportParametersSelectAValueText: '<select a value>',\n" + " ReportParametersInvalidValueText: 'Invalid value.',\n" + " ReportParametersNoValueText: 'Value required.',\n" + " ReportParametersNullText: 'NULL',\n" + " ReportParametersPreviewButtonText: 'Preview',\n" + " ReportParametersFalseValueLabel: 'False',\n" + " ReportParametersInputDataError: 'Missing or invalid parameter value. Please input valid data for all parameters.',\n" + " ReportParametersTrueValueLabel: 'True',\n" + " MissingReportSource: 'The source of the report definition has not been specified.',\n" + " ZoomToPageWidth: 'Page Width',\n" + " ZoomToWholePage: 'Full Page'\n" + " }, 'ReportViewer1_ReportArea_ReportArea', 'ReportViewer1_ReportArea_SplitterCell', 'ReportViewer1_ReportArea_DocumentMapCell', true, true, 'PDF', 'ReportViewer1_RSID', true);\n" + " </script>\n" + "\n" + "</html>"; }
I would read the text a line at a time like how most files are read. Because the format will always be the same, you look for a line that begins with the characters "var ReportViewer1." Then you know you have found the line you want. You may need to strip some white space, although it will always be formatted with the same whitespace too (up to you really.) When you have the line, use the String .split() method to split that line into an array. There are nice delimiters there to split on ... "," or " " or ", " ... again, see what works best for you. Test the split up line parts for '/app/Telerik.ReportViewer.axd' ... the next member of your split array will be the value you are looking for. Again, the formatting will always be the same, so you can rely on that to find your variable. Of course, study the html text to make sure it does always follow the same format within the line you are investigating, but looking at it, I assume it probably does. Again, find your line ... split it on a delimiter ... and use some logic to find the element you are after in the split up line parts.
GWT Java calculate days between two dates on the server side
I am using GWT java. I have a report on the client side that I want to export to csv. So I am trying to create the report on the server side to pass back to the client side as a csv file so the user can store the csv file in their selected destination. The following code works on the client side; however, does not work on the server side (i.e., "Print 6." is displayed and "Print 7." is not displayed on the server side). There is no error message. The dates are "2016-11-09" and "2000-02-02". System.out.println(todays_date); System.out.println(pack.getDob()); System.out.println("Print 6."); float diffDOB = (CalendarUtil.getDaysBetween(pack.getDob(), todays_date)); System.out.println("Print 7.");
I was talking about this with a fiend, who is not a programmer, and she said why not just send the MS Excel formula. Brilliant! So I have removed all the calclations and am now sending: fileContent.append(pack.getSurname() + "," + pack.getFirstname() + "," + pack.getScout_no() + "," + pack.getgroup() + "," + pack.getDob() + "," + '"' + "=ROUNDDOWN(((TODAY()-E"+row+")/365),1)" + '"' + "," + pack.getstartDate() + "," + '"' + "=ROUNDDOWN(((TODAY()-G"+row+")/365),1)" + '"' + "," + '"' + "=EDATE(E"+row+",216)" + '"' + "\n"); The only issue now is that spaces in the string fields appear as + in the output file.
How to set the multiple attributes of objectClass for UnboundID & OpenLDAP via Java 7
I'm not sure how to properly pass the multiple attributes needed for an OpenLDAP insert via UnboundID. I have omitted the objectClass attributes & received a "no objectClass" error. I have also tried comma-separated & the bracket/array route like below & received the "value #0 invalid per syntax" error. String[] ldifLines = {"dn: ou=users,dc=sub,dc=domain,dc=com", "cn: " + uid, "userPassword: " + pw, "description: user", "uidNumber: " + lclDT, "gidNumber: 504", "uid: " + uid, "homeDirectory: " + File.separator + "home" + File.separator + this.getStrippedUser(), "objectClass: {posixAccount, top}"}; LDAPResult ldapResult = lclLC.add(new AddRequest(ldifLines)); So, the question is, how do I successfully pass these objectClass attributes in the string array included above? Again, I have tried: "objectClass: top, posixAccount" as well. Thanks in advance!
It uses an LDIF representation, so if an attribute has multiple values, then the attribute appears multiple times. Like: String[] ldifLines = { "dn: ou=users,dc=sub,dc=domain,dc=com", "objectClass: top", "objectClass: posixAccount" "cn: " + uid, "userPassword: " + pw, "description: user", "uidNumber: " + lclDT, "gidNumber: 504", "uid: " + uid, "homeDirectory: " + File.separator + "home" + File.separator + this.getStrippedUser(), }; LDAPResult ldapResult = lclLC.add(new AddRequest(ldifLines)); Also, the LDAP SDK allows you to use a shortcut and just do it in a single call without the need to create the array or the AddRequest object, like: LDAPResult ldapResult = lclLC.add( "dn: ou=users,dc=sub,dc=domain,dc=com", "objectClass: top", "objectClass: posixAccount" "cn: " + uid, "userPassword: " + pw, "description: user", "uidNumber: " + lclDT, "gidNumber: 504", "uid: " + uid, "homeDirectory: " + File.separator + "home" + File.separator + this.getStrippedUser());
Writing and reading inherited objects from a file with bufferred reader
I am writing a librarian admin program and I have two classes: members and premium members. there is no real difference other than the fact that premium members can take out 5 books as opposed to only 3. I am using a buffered reader and writer to read these items but I am wonderring how (or If I can at all) read and write both objects to a single .txt file? here's my code I have so far: Member class public String toString(int i) { String stg = ""; stg += memID + "#" + name + "#" + surname + "#" + address + "#" + email + "#" + cellnum + "#" + num + "#" + accntTyp + "#" + pass + "#" + book1 + "#" + book2 + "#" + book3 + "#"; return stg; } I have yet to do premium members class (wanted to get an answer while i do it) but it would pretty much look the same with an exception of 2 extra books.