My java Base64 results are not the same as php - java

I have previously posted a question related to this work link but I am posting a new one because the question is not completely resolved
I am working on converting the completed code into java with php.
It is a function that reads encrypted files, decrypts them by 16 bytes, makes them into a single string, and encodes them with base 64.
php is already on the server and running, and I have to use java to produce the same result.
If you decrypt the read file,
<?xml version="1.0" encoding="utf-8" ?>
<FileInfo>
...
<TextData> (text data) </TextData>
</FileInfo>
(image data)
It is in the format, and the text data in shows php and java exactly the same.
I am trying to encode the image data part into base64, but the result is different from php.
This is the part of the php code I have that decrypts and processes the read file
$fileContentArray = array(16);
$transContentArray = array(16);
$fileRead = fread($fp,16);
for($i = 0 ; $i < strlen($fileRead); $i++){
$fileContentArray[$i] = ord($fileRead[$i]);
}
$seed->SeedDecrypt($fileContentArray,$pdwRoundKey,$transContentArray);
$transStr = call_user_func_array("pack",array_merge(array("C16"),$transContentArray));
$mergeStr .= $transStr;
}
$dataExplode = explode("<TextData>",trim($mergeStr) );
$dataExplode1 = explode("</FileInfo>",trim($dataExplode[1]) );
$dataExplode2 = explode("</TextData>",$dataExplode1[0]);
$textData = iconv("EUC-KR","utf-8",$dataExplode2[0]);
$imageData = base64_encode(trim($dataExplode1[1]));
And this is the same part of the java code I wrote
byte[] fileContentArray=new byte[n];
for(int i=0;i<fileContentArray.length;i++){
fileContentArray[i]=mergeArr[nReadCur+i];
}
seed.SeedDecrypt(fileContentArray, pdwRoundKey, outbuf);
System.arraycopy(outbuf, 0, resultArr, nReadCur, outbuf.length);
nReadCur=nReadCur+fileContentArray.length;
p=p+fileContentArray.length;
if(p>=nFileSize){
fis.close();
break;
}
}//while
mergeStr=new String(resultArr,"MS949");
String[] dataExplode=mergeStr.trim().split("<TextData>");
String[] dataExplode1=dataExplode[1].trim().split("</FileInfo>");
String[] dataExplode2=dataExplode1[0].trim().split("</TextData>");
String textData = "";
String imageData = "";
textData=dataExplode2[0];
imageData=dataExplode1[1];
Encoder encoder=Base64.getEncoder();
Decoder decoder=Base64.getDecoder();
byte[] encArr=encoder.encode(imageData.trim().getBytes("MS949"));
imageData=new String(encArr,"MS949");
As a result of encoding image data into base64
php: R0lGODlhAwNLBPcAAAAAAAAAMwAAZgAAmQAAzAAA/wArAAArMwArZgArmQArzAAr/wBVAABVMwBVZgBVmQBVzABV/wCAAACAMwCAZgCAmQCAzACA/ ... VzpYirO0le55zF0=
java: R0lGODlhAwNLBD8AAAAAAAAAMwAAZgAAPwAAPwAAPwArAAArMwArZgArPwArPwArPwBVAABVMwBVZgBVPwBVPwBVPwA/AAA/MwA/ZgA/PwA/PwA/PwA/ ... DAQEAOz9GPz8/IXY=
As you can see, the result values are output differently.
Is there anything I'm missing? What should I do to make the result of java the same as php?
Also, MergeStr, who has read the file,
java:
GIF89aK? 3 f ? ? ? + +3 +f +? +? +? U U3 Uf U? U? U? ? ?3 ?f ?? ?? ?? ? ?3 챖 첌 ぬ ? ? ?3 ?f ??
...
J뇽杞H?*]苛⒢쬝쥻쒳뎁諾X...
A?h?~?0a?2$ #삁?d?Dd??e ...
...
WC ;홃?뿿!v
php:
GIF89aK? 3 f ? ?  + +3 +f +? +? + U U3 Uf U? U? U 3 f ? ?  ? ? 챖 첌 ぬ ? ? ? ? ? 螂 ?  3 f ? ? ...
A??~?a?$ #삁?d?Dd?e...
...
WC ;홃??v余퍙W:X뒽킉??
Like this, there is a new line text that I didn't put in, and there's a slight difference in result. Is this a simple difference in encoding? And does this affect the base64 conversion?
I tried encoding with UTF-8 and failed again,
and I used this code to load all bytes of the file at once
FileInputStream fis = new FileInputStream(tpf);
fis.read(mergeArr);

Java uses a little bit different encoding for base64 than PHP. You may use these helper functions to make them compatible to the Java version.
function base64url_encode($data)
{
return rtrim(strtr(base64_encode($data), '+/', '-_'), '=');
}
function base64url_decode($data, $strict = false)
{
return base64_decode(strtr($data, '-_', '+/'), $strict);
}

Related

how to figure out which character doesn't map to utf-8

I maintain a small java servlet-based webapp that presents forms for input, and writes the contents of those forms to MariaDB.
The app runs on a Linux box, although the users visit the webapp from Windows.
Some users paste text into these forms that was copied from MSWord docs, and when that happens, they get internal exceptions like the following:
Caused by: org.mariadb.jdbc.internal.util.dao.QueryException:
Incorrect string value: '\xC2\x96 for...' for column 'ssimpact' at row
1
For instance, I tested it with text like the following:
Project – for
Where the dash is a "long dash" from the MSWord document.
I don't think it's possible to convert the wayward characters in this text to the "correct" characters, so I'm trying to figure out how to produce a reasonable error message that shows a substring of the bad text in question, along with the index of the first bad character.
I noticed postings like this: How to determine if a String contains invalid encoded characters .
I thought this would get me close, but it's not quite working.
I'm trying to use the following method:
private int findUnmappableCharIndex(String entireString) {
int charIndex;
for (charIndex = 0; charIndex < entireString.length(); ++ charIndex) {
String currentChar = entireString.substring(charIndex, charIndex + 1);
CharBuffer out = CharBuffer.wrap(new char[currentChar.length()]);
CharsetDecoder decoder = Charset.forName("utf-8").newDecoder();
CoderResult result = decoder.decode(ByteBuffer.wrap(currentChar.getBytes()), out, true);
if (result.isError() || result.isOverflow() || result.isUnderflow() || result.isMalformed() || result.isUnmappable()) {
break;
}
CoderResult flushResult = decoder.flush(out);
if (flushResult.isOverflow()) {
break;
}
}
if (charIndex == entireString.length() + 1) {
charIndex = -1;
}
return charIndex;
}
This doesn't work. I get "underflow" on the first character, which is a valid character. I'm sure I don't fully understand the decoder mechanism.

Converting CSV file to LIBSVM compatible data file using python

I am doing a project using libsvm and I am preparing my data to use the lib. How can I convert CSV file to LIBSVM compatible data?
CSV File:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/data/iris.csv
In the frequencies questions:
How to convert other data formats to LIBSVM format?
It depends on your data format. A simple way is to use libsvmwrite in the libsvm matlab/octave interface. Take a CSV (comma-separated values) file in UCI machine learning repository as an example. We download SPECTF.train. Labels are in the first column. The following steps produce a file in the libsvm format.
matlab> SPECTF = csvread('SPECTF.train'); % read a csv file
matlab> labels = SPECTF(:, 1); % labels from the 1st column
matlab> features = SPECTF(:, 2:end);
matlab> features_sparse = sparse(features); % features must be in a sparse matrix
matlab> libsvmwrite('SPECTFlibsvm.train', labels, features_sparse);
The tranformed data are stored in SPECTFlibsvm.train.
Alternatively, you can use convert.c to convert CSV format to libsvm format.
but I don't wanna use matlab, I use python.
I found this solution as well using JAVA
Can anyone recommend a way to tackle this problem ?
You can use csv2libsvm.py to convert csv to libsvm data
python csv2libsvm.py iris.csv libsvm.data 4 True
where 4 means target index, and True means csv has a header.
Finally, you can get libsvm.data as
0 1:5.1 2:3.5 3:1.4 4:0.2
0 1:4.9 2:3.0 3:1.4 4:0.2
0 1:4.7 2:3.2 3:1.3 4:0.2
0 1:4.6 2:3.1 3:1.5 4:0.2
...
from iris.csv
150,4,setosa,versicolor,virginica
5.1,3.5,1.4,0.2,0
4.9,3.0,1.4,0.2,0
4.7,3.2,1.3,0.2,0
4.6,3.1,1.5,0.2,0
...
csv2libsvm.py does not work with Python3, and also it does not support label targets (string targets), I have slightly modified it. Now It should work with Python3 as well as wıth the label targets.
I am very new to Python, so my code may do not follow the best practices, but I hope it is good enough to help someone.
#!/usr/bin/env python
"""
Convert CSV file to libsvm format. Works only with numeric variables.
Put -1 as label index (argv[3]) if there are no labels in your file.
Expecting no headers. If present, headers can be skipped with argv[4] == 1.
"""
import sys
import csv
import operator
from collections import defaultdict
def construct_line(label, line, labels_dict):
new_line = []
if label.isnumeric():
if float(label) == 0.0:
label = "0"
else:
if label in labels_dict:
new_line.append(labels_dict.get(label))
else:
label_id = str(len(labels_dict))
labels_dict[label] = label_id
new_line.append(label_id)
for i, item in enumerate(line):
if item == '' or float(item) == 0.0:
continue
elif item=='NaN':
item="0.0"
new_item = "%s:%s" % (i + 1, item)
new_line.append(new_item)
new_line = " ".join(new_line)
new_line += "\n"
return new_line
# ---
input_file = sys.argv[1]
try:
output_file = sys.argv[2]
except IndexError:
output_file = input_file+".out"
try:
label_index = int( sys.argv[3] )
except IndexError:
label_index = 0
try:
skip_headers = sys.argv[4]
except IndexError:
skip_headers = 0
i = open(input_file, 'rt')
o = open(output_file, 'wb')
reader = csv.reader(i)
if skip_headers:
headers = reader.__next__()
labels_dict = {}
for line in reader:
if label_index == -1:
label = '1'
else:
label = line.pop(label_index)
new_line = construct_line(label, line, labels_dict)
o.write(new_line.encode('utf-8'))

Whitespaces in Java/PHP

Spaces not changing to underscored when sent from Java-->PHP-->SQL
Java code:
String urlString = "http://www.mysite.com/auth/verifyuser.php?name="+name.toLowerCase().replace(" ","_");
PHP code:
$name = mysql_real_escape_string($_GET['name']);
$name = str_replace(' ', '_', $name);
$query = "select * from authinfo where name LIKE '$name'";
mysql_query($query);
$num = mysql_affected_rows();
if ($num > 0) {
echo '1';
} else {
echo '0';
}
when I implement a test log on the SQL database, it somehow still seems to show up with spaces instead of underscores(even though I replace it in Java and PHP) and the PHP file returns '0' rather than '1'. I've heard the issue might be whitespaces? It seems to happen to only certain users, mostly mac users.
If your php file is returning a 0, that means your query is not getting executed. Where are you establishing a connection with the database before executing the query?
Remark: where name = '$name'
mysql_affected_rows concerns INSERT, UPDATE and DELETE.
$r = mysql_query($query);
$num = mysql_num_rows($r);
It's unsafe to pass raw name into URL without encoding it.
String urlString = "http://www.example.com/auth/verifyuser.php?name=" + URLEncoder.encode(name.toLowerCase(), "UTF-8");
In PHP you can obtain data:
$name = urldecode($_GET['name']);

SQL Server Java Compatible GUID / UUID (Big Endian UniqueIdentifier)

I am trying to convert a 128 bit binary to a uniqueidentifier in sql that is the same as in .net and java.
I know java uses big endians, so I would like to make that the base.
I can get the correct endianess in .net, but am really struggling with it in SQL Server.
Java:
byte[] bytesOfMessage = "google.com".getBytes("UTF-8");
MessageDigest md = MessageDigest.getInstance("MD5");
byte[] md5 = md.digest(bytesOfMessage);
ByteBuffer bb = ByteBuffer.wrap(md5);
LongBuffer ig = bb.asLongBuffer();
return new UUID(ig.get(0), ig.get(1));
returns 1d5920f4-b44b-27a8-02bd-77c4f0536f5a
.Net
System.Security.Cryptography.MD5 c = System.Security.Cryptography.MD5.Create();
byte[] b = c.ComputeHash(Encoding.UTF8.GetBytes("google.com"));
int z = System.Net.IPAddress.HostToNetworkOrder(BitConverter.ToInt32(b, 0));
short y = System.Net.IPAddress.HostToNetworkOrder(BitConverter.ToInt16(b, 4));
short x = System.Net.IPAddress.HostToNetworkOrder(BitConverter.ToInt16(b, 6));
Guid g = new Guid(z, y, x, b.Skip(8).ToArray());
return g;
returns 1d5920f4-b44b-27a8-02bd-77c4f0536f5a
SQL
DECLARE #s VARCHAR(MAX) = 'google.com' --'goolge.com'
DECLARE #md5 BINARY(16) = HASHBYTES
(
'MD5',
#s
)
DECLARE #a BINARY(4) =
CONVERT
(
BINARY(4),
REVERSE
(
CONVERT
(
BINARY(4),
LEFT(#md5, 4)
)
)
)
DECLARE #b BINARY(2) =
CONVERT
(
BINARY(2),
REVERSE
(
CONVERT
(
BINARY(2),
RIGHT(#md5, 12)
)
)
)
DECLARE #c BINARY(2) =
CONVERT
(
BINARY(2),
REVERSE
(
CONVERT
(
BINARY(2),
RIGHT(#md5, 10)
)
)
)
DECLARE #d BINARY(8) =
CONVERT
(
BINARY(8),
RIGHT(#md5, 8)
)
SELECT
CONVERT
(
UNIQUEIDENTIFIER,
#a + #b + #c + #d
)
returns D86B5A7F-7A25-4895-A6D0-63BA3A706627
I am able to get all three to produce the same value when converting to an int64, but the GUID is baffling me.
Original Issue
Original Answer
If you correct the spelling of google in your SQL example (it's goolge in your post), you get the right result.
SQL Server does not support UTF-8 encoding. See Description of storing UTF-8 data in SQL Server. Use suggestion from Michael Harmon to add your .NET function to SQLServer to do the conversion. See How to encode... for instructions on how to add your .NET function to SQLServer.
Alternatively, don't specify UTF-8 in your Java and .NET code. I believe SQL Server will use same 256 bit encoding for varchar as does Java and .NET. (But not totally sure of this.)

JSON - is there any XML CDATA equivalent?

I'm looking for a way that json parsing will take information as is (as if it was CDATA) - and not to try to serialize that.
We use both .net and java (client and server) - so the answer should be about JSON structure
Is there any way to achieve this structure?
Thanks.
There is no XML CDATA equivalent in JSON. But you can encode your message in a string literal using something like base64. See this question for more details.
This is a development of Raman's suggestion above.
I love the JSON format, but there are two things I want to be able to do with it and cannot:
Paste some arbitrary text into a value using a text editor
Transparently convert between XML and JSON if the XML contains CDATA sections.
This thread is germane to both these issues.
I am proposing to overcome this in the following manner, which doesn't break the formal definition of JSON, and I wonder if I'm storing up any problems if I do this?
Define a JSON-compatible string format as follows:
"<![CDATA[ (some text, escaped according to JSON rules) ]]>"
Write an Unescape routine in my favorite programming language, which unescapes anything between <![CDATA[ and ]]>. This will be called before offering any JSON file to my text editor.
Write the complementary routine to call after editing the file, which re-escapes anything between <![CDATA[ and ]]> according to JSON rules.
Then in order to paste any arbitrary data into the file, all I need to do is signal the start and end of the arbitrary data within a JSON string by typing <![CDATA[ before and ]]> after it.
This is a routine to call before and after text-editing, in Python3:
lang-python3
escape_list = {
8 : 'b',
9 : 't',
10: 'n',
12: 'f',
13: 'r',
34: '"',
} #List of ASCII character codes to escape, with their escaped equivalents
escape_char = "\\" #this must be dealt with separately
unlikely_string = "ZzFfGgQqWw"
shebang = "#!/json/unesc\n"
start_cdata = "<![CDATA["
end_cdata = "]]>"
def escapejson(json_path):
if (os.path.isfile(json_path)): #If it doesn't exist, we can't update it
with open(json_path) as json_in:
data_in = json_in.read() #use read() 'cos we're goint to treat as txt
#Set direction of escaping
if (data_in[:len(shebang)] == shebang): #data is unescaped, so re-escape
data_in = data_in[len(shebang):]
unescape = False
data_out = ""
else:
data_out = shebang
unescape = True
while (data_in != ""): #while there is still some input to deal with
x = data_in.find(start_cdata)
x1 = data_in.find(end_cdata)
if (x > -1): #something needs escaping
if (x1 <0):
print ("Unterminated CDATA section!")
exit()
elif (x1 < x): #end before next start
print ("Extra CDATA terminator!")
exit()
data_out += data_in[:x]
data_in = data_in[x:]
y = data_in.find(end_cdata) + len(end_cdata)
to_fix = data_in[:y] #this is what we're going to (un)escape
if (to_fix[len(start_cdata):].find(start_cdata) >= 0):
print ("Nested CDATA sections not supported!")
exit()
data_in = data_in[y:] #chop data to fix from front of source
if (unescape):
to_fix = to_fix.replace(escape_char + escape_char,unlikely_string)
for each_ascii in escape_list:
to_fix = to_fix.replace(escape_char + escape_list[each_ascii],chr(each_ascii))
to_fix = to_fix.replace(unlikely_string,escape_char)
else:
to_fix = to_fix.replace(escape_char,escape_char + escape_char)
for each_ascii in escape_list:
to_fix = to_fix.replace(chr(each_ascii),escape_char + escape_list[each_ascii],)
data_out += to_fix
else:
if (x1 > 0):
print ("Termination without start!")
exit()
data_out += data_in
data_in = ""
#Save all to file of same name in same location
try:
with open(json_path, 'w') as outfile:
outfile.write(data_out)
except IOError as e:
print("Writing "+ json_path + " failed "+ str(e))
else:
print("JSON file not found")
Operating on the following legal JSON data
{
"test": "<![CDATA[\n We can put all sorts of wicked things like\n \\slashes and\n \ttabs and \n \"double-quotes\"in here!]]>"
}
...will produce the following:
#!/json/unesc
{
"test": "<![CDATA[
We can put all sorts of wicked things like
\slashes and
tabs and
"double-quotes"in here!]]>"
}
In this form, you can paste in any text between the markers. Calling the rountine again will change it back to the original legal JSON.
I think this can also be made to work when converting to/from XML with CDATA regions. (I'm going to try that next!)
You can create a YAML file and convert to JSON. For example:
test.yaml
storage:
files:
- filesystem: root
path: /etc/sysconfig/network/ifcfg-eth0
mode: 644
overwrite: true
contents:
source: |
data:,
IPV6INIT=yes
IPV6_AUTOCONF=yes
... then run yaml2json_pretty (shown later), like this:
#!/bin/bash
cat test.yaml | yaml2json_pretty > test.json
... which produces:
test.json
{
"storage": {
"files": [
{
"filesystem": "root",
"path": "/etc/sysconfig/network/ifcfg-eth0",
"mode": 644,
"overwrite": true,
"contents": {
"source": "data:,\nIPV6INIT=yes\nIPV6_AUTOCONF=yes\n"
}
}
]
}
}
This is the source code of yaml2json_pretty:
#!/usr/bin/env python3
import sys, yaml, json
print(json.dumps(yaml.load(sys.stdin.read(),Loader=yaml.FullLoader), sort_keys=False, indent=2))
More tricks similar to this yaml2json_pretty at: http://github.com/frgomes/bash-scripts
http://www.json.org/ describes JSON format in details. According to it JSON doesn't support "something like CDATA" value type.
To achieve CDATA structure you can apply custom logic to handle string based values (and do it in the same way both for .net and java implementations). E.g.
{
"type" : "CDATA",
"value" : "Value that I will handle with my custom logic on java and .net side"
}

Categories

Resources