Protobuf JsonFormater.printer convert long to String in Json

Protobuf JsonFormater.printer convert long to String in Json - java

Currently I encounter a behavior of JsonFormater.printer printing the long(fixed64) value as String in JSON.
Is there any way/option to set the JsonFormater.printer not to do this conversion (Long(fixed64) -> String in Json)?
The Json is consumed by Java app, representing fixed64 as integer in JSON should not be a problem for Java.
Here is the code:
In data.proto
syntax = "proto2";
message data{
required fixed64 user_id = 1;
required int32 member_id = 2
}
Here is the java code, the file format is *.pb.gz
import com.google.protobuf.util.JsonFormat;
.......
//print data in JSON format
final InputStream input = new GZIPInputStream(new FileInputStream(pathToFile));
Message m;
m = defaultMsg.getParserForType().parseDelimitedFrom(input));
String jsonString = JsonFormat.printer().preservingProtoFieldNames().omittingInsignificantWhitespace().print(m);
Generated Java class: Data.java (Generated with protoc 2.6.1)
...
private long userId_;
...
private int memberId_;
...
expected result:
{"user_id":6546585813946021349,member_id":7521}
actual result:
{"user_id":"6546585813946021349",member_id":7521}
The user_id is String in json, but I want it as integer
Thanks
David

It seems this is by design, according to the source code. UINT64 and FIXED64 types are always printed out with surrounding double quotes, no questions asked:
https://github.com/protocolbuffers/protobuf/blob/f9d8138376765d229a32635c9209061e4e4aed8c/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java#L1081-L1084
case INT64:
case SINT64:
case SFIXED64:
generator.print("\"" + ((Long) value).toString() + "\"");
In that same file, a few lines above, you can see that INT32 types are only double quoted if they're keys in a map (which your proto obviously doesn't have).
So, I'd ask for more information on the protobuf mailing list, or maybe report it as a bug/feature-request.

Related

Parquet Binary storing value in encoded format

I'm creating a parquet file from Java using org.apache.parquet.*. No issues with other data types, but when a binary value is written and I cat the parquet file using parquet-tools, it is showing the value in encoded format. Because of that, the parquet is not processed in my system further.
Code block:
case BINARY:
recordConsumer.addBinary(stringToBinary(val));
break;
AND
private Binary stringToBinary(Object value) {
return Binary.fromString(value.toString());
}
Schema used is:
message m {
required INT64 id;
required binary username;
required boolean active;
}
When I cat:
parquet-tools cat <parquetFileName>
I see something like this:
id = 1
username = TmFtZTE=
active = true
id = 2
username = TmFtZTI=
active = false
I want to see the actual Username passed and not the encoded strings.

Try this in your schema
required binary username (UTF8)

How to encode a string to base64 encoding algorithm using PowerBuilder

Requirement:
Given a String
we need to generate Base64 encoded string using the above given string.
How can we implement it using powerBuilder.
For reference, The Java implementation of the above case is as follows:
import org.apache.tomcat.util.codec.binary.Base64;
import java.io.UnsupportedEncodingException;
public String getClientEncoded() throws UnsupportedEncodingException {
String givenString= "Input_String";
String bytesEncoded = Base64.encodeBase64String(valueToHash.getBytes("UTF-8"));
System.out.println("encoded value is " + bytesEncoded);
return bytesEncoded ;
}
============================================================
As per Matt's reply, Used this below code from the 1st link:
String ls_valueToBeEncoded
blob lblob
ls_valueToBeEncoded = "realhowto"
lblob = Blob(ls_valueToBeEncoded)
ULong lul_len, lul_buflen
Boolean lb_rtn
lul_len = Len(ablob_data)
lul_buflen = lul_len * 2
ls_encoded = Space(lul_buflen)
lb_rtn = CryptBinaryToString(ablob_data, &
lul_len, CRYPT_STRING_BASE64, &
ref ls_encoded, lul_buflen) // Used ref ls_encoded to get the string. Otherwise, junk characters gets stored in ls_encoded.`
=======================================
Used the below code in Global External Function:
`FUNCTION boolean CryptBinaryToString ( &
Blob pbBinary, &
ulong cbBinary, &
ulong dwFlags, &
Ref string pszString, &
Ref ulong pcchString ) &
LIBRARY "crypt32.dll" ALIAS FOR "CryptBinaryToStringA;Ansi"`
=========================================
According to the 1st link suggested by Matt, The string "realhowto" should be converted to "cmVhbGhvd3Rv."
But when I tried the above code, I got "cgBlAGEAbABoAG8AdwB0AG8A"
Any advise will be appreciated.

Check out this link
Make sure you look at the comments as well.
Another option is here.
Real's How To is a very good reference for many PowerBuilder tips.

My encryption examples also have the API functions used by Real's example.
http://www.topwizprogramming.com/freecode_bcrypt.html

Checked with the 1st link given by Matt.
The solution worked.
Only thing I was missing earlier was while converting the original string to blob, we need to convert via the following:
String ls_original_string
Blob lblob_test
lblob_test = Blob(ls_original_string, EncodingAnsi!)
The blob lblob_test can be passed as an argument to the function CryptBinaryToString(...)

How can I set pageToken to get item lists from Google Cloud Storage via Java SDK?

I want to set pageToken to get items stored at Google Cloud Storage. I'm using Google API Client Library for Java v1.19.x.
I have no idea to generate pageToken from file path(or file name).
2 files stored in bucket.
my-bucket
/test.csv
/test2.csv
When I tried Google APIs Explorer with following parameters, I could get nextPageToken Cgh0ZXN0LmNzdg==.
And I found out that I can get test.csv string by decoding nextPageToken with base64.
bucket: my-bucket
pageToken:
prefix: test
maxResults: 1
{"kind": "storage#objects", "nextPageToken": "Cgh0ZXN0LmNzdg==", ...}
But How can I get Cgh0ZXN0LmNzdg== from test.csv?
Although I tried Base64 encoding, result didn't match.
import com.google.api.client.repackaged.org.apache.commons.codec.binary.Base64;
String lastFile = "test.csv"
String token = Base64.encodeBase64String(lastFile.getBytes());
String bucket = "my-bucket"
String prefix = "test"
Storage.Objects.List listObjects = client.objects().list(bucket);
listObjects.setPrefix(prefix);
listObjects.setPageToken(token);
long maxResults = 1;
listObjects.setMaxResults(maxResults);
do {
Objects objects = listObjects.execute();
List<StorageObject> items = objects.getItems();
token = objects.getNextPageToken();
listObjects.setPageToken(token);
} while (token != null);

I could get next token from file path string using following codes by myself.
How to get nextToken from path string
String nextToken = base64encode(0x0a + asciiCode + pathString)
asciiCode can be taken between 0x01(SOH) and 0x7f(DEL). It seems to depend on path length.
my-bucket/
a/a(3byte) 0x03
a/ab(4byte) 0x04
test.txt(8byte) 0x08
Notice
If path length is longer than 1024 byte, another rule seems to apply. But I couldn't found out rules.
See also Object Name Requirements
import com.google.common.io.BaseEncoding;
String lastFile = "test.csv"
String token = base64Encode(lastFile);
String bucket = "my-bucket"
String prefix = "test"
Storage.Objects.List listObjects = client.objects().list(bucket);
listObjects.setPrefix(prefix);
listObjects.setPageToken(token);
long maxResults = 1;
listObjects.setMaxResults(maxResults);
do {
Objects objects = listObjects.execute();
List<StorageObject> items = objects.getItems();
token = objects.getNextPageToken();
listObjects.setPageToken(token);
} while (token != null);
private String base64Encode(String path) {
byte[] encoding;
byte[] utf8 = path.getBytes(Charsets.UTF_8);
encoding = new byte[utf8.length + 2];
encoding[0] = 0x0a;
encoding[1] = new Byte(String.valueOf(path.length()));
String s = BaseEncoding.base64().encode(encoding);
return s;
}

I know this question is already answered and is applied to Java, I'd like to mention that this question applies to PHP as well.
With the help of the approved post from sakama above I figured out a PHP version of his solution.
The PHP equivalent for generating the token is as follow:
base64_encode(pack('c', 0x0a) . pack('c', $path_string_length) . pack('a*', $path_string));
The byte pattern seems indeed (as sakama already mentioned) to be:
<line feed><line data length><line data>

How to convert a String FIX message to FIX FIX50SP2 format using QuickFixJ

Need a quick help. I am a newbie in QuickFixJ. I have a FIX message in a txt file. I need to convert that into FIX50SP2 format. I am enclosing the code snippet.
String fixMsg = "1128=99=25535=X49=CME34=47134052=20100318-03:21:11.36475=20120904268=2279=122=848=336683=607400107=ESU2269=1270=140575271=152273=121014000336=2346=521023=1279=122=848=336683=607401107=ESU2269=1270=140600271=206273=121014000336=2346=681023=210=159";
System.out.println("FixMsg String:"+fixMsg);
Message FIXMessage = new Message();
DataDictionary dd = new DataDictionary("FIX50SP2.xml");
FIXMessage.fromString(fixMsg, dd, false);
System.out.println("FIXMessage Output:" + FIXMessage.toString()); // Print message after parsing
MsgType msgType = new MsgType();
System.out.println(FIXMessage.getField(msgType));
Here is the output:
FixMsg String:1128=99=15835=X49=CME34=47164052=2012090312102051175=20120904268=1279=122=848=336683=607745107=ESU2269=1270=140575271=123273=121020000336=2346=501023=110=205
FIXMessage Output:9=6135=X34=47164049=CME52=2012090312102051175=20120904268=110=117
quickfix.FieldNotFound: Field [35] was not found in message.
at quickfix.FieldMap.getField(FieldMap.java:216)
at quickfix.FieldMap.getFieldInternal(FieldMap.java:353)
at quickfix.FieldMap.getField(FieldMap.java:349)
at MainApp.main(MainApp.java:52)
I want to extract MsgType field (field 35). Could you please tell me where I am wrong? The thing I have observed is that after parsing to FIX50SP2 format, the convert FIX message is missing many data element (for details see the output)
Thanks

Like others mentioned the MsgType is an header field and you get it by using the following
String msgType = null;
if(FIXMessage.getHeader().isSetField(MsgType.FIELD)) {
msgType = FIXMessage.getHeader().getString(MsgType.FIELD);
}
System.out.println("MsgType is " + msgType);`
The reason you are missing many data element after parsing is, probably your message have some custom tags(like tag 2346), which is not defined in your data dictionary(FIXSP02.xml). hence the parsing of those tags failed and missing in the output.
To fix this, get the data dictionary from the party that is sending you the message and use it to parse the message

I'm not familiar with FIX messages and QuickFixJ, but glancing at the Javadoc, it seems like you should use the identifyType method :
String fixMsg = "1128=99=25535=X49=CME34=47134052=20100318-03:21:11.36475=20120904268=2279=122=848=336683=607400107=ESU2269=1270=140575271=152273=121014000336=2346=521023=1279=122=848=336683=607401107=ESU2269=1270=140600271=206273=121014000336=2346=681023=210=159";
MsgType msgType = Message.identifyType(fixMsg);

You may find FixB framework useful as it deals well with non-standard use cases of FIX.
As in your case, to extract only data you are interested in, you need to define a class that will represent this data and to bind it to FIX using annotations. E.g.:
#FixBlock
public class MDEntry {
#FixField(tag=269) public int entryType; // you could define an enum type for it as well
#FixField(tag=278) public String entryId;
#FixField(tag=55) public String symbol;
}
...
FixFieldExtractor fixExtractor = new NativeFixFieldExtractor();
List<MDEntry> mdEntries = fixExtractor.getGroups(fixMsg, List.class, 268, FixMetaScanner.scanClass(MDEntry.class))
In more common cases, FixSerializer interface should be used, but it requires a message with MsgType(35) tag and a class annotated with #FixMessage(type="...") accordingly. E.g.:
#FixMessage(type="X")
public class MarketData {
#FixGroup(tag=268) public List<MDEntry> entries;
}
...
FixMetaDictionary fixMetaDictionary = FixMetaScanner.scanClassesIn("my.fix.classes.package");
FixSerializer fixSerializer = new NativeFixSerializer("FIX.5.0.SP2", fixMetaDictionary);
MarketData marketData = fixSerializer.deserialize(fixMsg);
I hope you will find it useful.

If you need just a MsgTyp, you're sure the message is correct and you do not need any other field from the message, then I would recommend extracting MsgType from string using regexp.
e.g.: \u000135=(\w+)\u0001
It is MUCH FASTER than parsing (and validating) a string via QuickFix.

Is there an smart way to write a fixed length flat file?

Is there any framework/library to help writing fixed length flat files in java?
I want to write a collection of beans/entities into a flat file without worrying with convertions, padding, alignment, fillers, etcs
For example, I'd like to parse a bean like:
public class Entity{
String name = "name"; // length = 10; align left; fill with spaces
Integer id = 123; // length = 5; align left; fill with spaces
Integer serial = 321 // length = 5; align to right; fill with '0'
Date register = new Date();// length = 8; convert to yyyyMMdd
}
... into ...
name 123 0032120110505
mikhas 5000 0122120110504
superuser 1 0000120101231
...

You're not likely to encounter a framework that can cope with a "Legacy" system's format. In most cases, Legacy systems don't use standard formats, but frameworks expect them. As a maintainer of legacy COBOL systems and Java/Groovy convert, I encounter this mismatch frequently. "Worrying with conversions, padding, alignment, fillers, etcs" is primarily what you do when dealing with a legacy system. Of course, you can encapsulate some of it away into handy helpers. But most likely, you'll need to get real familiar with java.util.Formatter.
For example, you might use the Decorator pattern to create decorators to do the conversion. Below is a bit of groovy (easily convertible into Java):
class Entity{
String name = "name"; // length = 10; align left; fill with spaces
Integer id = 123; // length = 5; align left; fill with spaces
Integer serial = 321 // length = 5; align to right; fill with '0'
Date register = new Date();// length = 8; convert to yyyyMMdd
}
class EntityLegacyDecorator {
Entity d
EntityLegacyDecorator(Entity d) { this.d = d }
String asRecord() {
return String.format('%-10s%-5d%05d%tY%<tm%<td',
d.name,d.id,d.serial,d.register)
}
}
def e = new Entity(name: 'name', id: 123, serial: 321, register: new Date('2011/05/06'))
assert new EntityLegacyDecorator(e).asRecord() == 'name 123 0032120110506'
This is workable if you don't have too many of these and the objects aren't too complex. But pretty quickly the format string gets intolerable. Then you might want decorators for Date, like:
class DateYMD {
Date d
DateYMD(d) { this.d = d }
String toString() { return d.format('yyyyMMdd') }
}
so you can format with %s:
String asRecord() {
return String.format('%-10s%-5d%05d%s',
d.name,d.id,d.serial,new DateYMD(d.register))
}
But for significant number of bean properties, the string is still too gross, so you want something that understands columns and lengths that looks like the COBOL spec you were handed, so you'll write something like this:
class RecordBuilder {
final StringBuilder record
RecordBuilder(recordSize) {
record = new StringBuilder(recordSize)
record.setLength(recordSize)
}
def setField(pos,length,String s) {
record.replace(pos - 1, pos + length, s.padRight(length))
}
def setField(pos,length,Date d) {
setField(pos,length, new DateYMD(d).toString())
}
def setField(pos,length, Integer i, boolean padded) {
if (padded)
setField(pos,length, String.format("%0" + length + "d",i))
else
setField(pos,length, String.format("%-" + length + "d",i))
}
String toString() { record.toString() }
}
class EntityLegacyDecorator {
Entity d
EntityLegacyDecorator(Entity d) { this.d = d }
String asRecord() {
RecordBuilder record = new RecordBuilder(28)
record.setField(1,10,d.name)
record.setField(11,5,d.id,false)
record.setField(16,5,d.serial,true)
record.setField(21,8,d.register)
return record.toString()
}
}
After you've written enough setField() methods to handle you legacy system, you'll briefly consider posting it on GitHub as a "framework" so the next poor sap doesn't have to to it again. But then you'll consider all the ridiculous ways you've seen COBOL store a "date" (MMDDYY, YYMMDD, YYDDD, YYYYDDD) and numerics (assumed decimal, explicit decimal, sign as trailing separate or sign as leading floating character). Then you'll realize why nobody has produced a good framework for this and occasionally post bits of your production code into SO as an example... ;)

If you are still looking for a framework, check out BeanIO at http://www.beanio.org

uniVocity-parsers goes a long way to support tricky fixed-width formats, including lines with different fields, paddings, etc.
Check out this example to write imaginary client & accounts details. This uses a lookahead value to identify which format to use when writing a row:
FixedWidthFields accountFields = new FixedWidthFields();
accountFields.addField("ID", 10); //account ID has length of 10
accountFields.addField("Bank", 8); //bank name has length of 8
accountFields.addField("AccountNumber", 15); //etc
accountFields.addField("Swift", 12);
//Format for clients' records
FixedWidthFields clientFields = new FixedWidthFields();
clientFields.addField("Lookahead", 5); //clients have their lookahead in a separate column
clientFields.addField("ClientID", 15, FieldAlignment.RIGHT, '0'); //let's pad client ID's with leading zeroes.
clientFields.addField("Name", 20);
FixedWidthWriterSettings settings = new FixedWidthWriterSettings();
settings.getFormat().setLineSeparator("\n");
settings.getFormat().setPadding('_');
//If a record starts with C#, it's a client record, so we associate "C#" with the client format.
settings.addFormatForLookahead("C#", clientFields);
//Rows starting with #A should be written using the account format
settings.addFormatForLookahead("A#", accountFields);
StringWriter out = new StringWriter();
//Let's write
FixedWidthWriter writer = new FixedWidthWriter(out, settings);
writer.writeRow(new Object[]{"C#",23234, "Miss Foo"});
writer.writeRow(new Object[]{"A#23234", "HSBC", "123433-000", "HSBCAUS"});
writer.writeRow(new Object[]{"A#234", "HSBC", "222343-130", "HSBCCAD"});
writer.writeRow(new Object[]{"C#",322, "Mr Bar"});
writer.writeRow(new Object[]{"A#1234", "CITI", "213343-130", "CITICAD"});
writer.close();
System.out.println(out.toString());
The output will be:
C#___000000000023234Miss Foo____________
A#23234___HSBC____123433-000_____HSBCAUS_____
A#234_____HSBC____222343-130_____HSBCCAD_____
C#___000000000000322Mr Bar______________
A#1234____CITI____213343-130_____CITICAD_____
This is just a rough example. There are many other options available, including support for annotated java beans, which you can find here.
Disclosure: I'm the author of this library, it's open-source and free (Apache 2.0 License)

The library Fixedformat4j is a pretty neat tool to do exactly this: http://fixedformat4j.ancientprogramming.com/

Spring Batch has a FlatFileItemWriter, but that won't help you unless you use the whole Spring Batch API.
But apart from that, I'd say you just need a library that makes writing to files easy (unless you want to write the whole IO code yourself).
Two that come to mind are:
Guava
Files.write(stringData, file, Charsets.UTF_8);
Commons / IO
FileUtils.writeStringToFile(file, stringData, "UTF-8");

Don't know of any frame work but you can just use RandomAccessFile. You can position the file pointer to anywhere in the file to do your reads and writes.

I've just find a nice library that I'm using:
http://sourceforge.net/apps/trac/ffpojo/wiki
Very simple to configurate with XML or annotations!

A simple way to write beans/entities to a flat file is to use ObjectOutputStream.
public static void writeToFile(File file, Serializable object) throws IOException {
ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(file));
oos.writeObject(object);
oos.close();
}
You can write to a fixed length flat file with
FileUtils.writeByteArrayToFile(new File(filename), new byte[length]);
You need to be more specific about what you want to do with the file. ;)

Try FFPOJO API as it has everything which you need to create a flat file with fixed lengths and also it will convert a file to an object and vice versa.
#PositionalRecord
public class CFTimeStamp {
String timeStamp;
public CFTimeStamp(String timeStamp) {
this.timeStamp = timeStamp;
}
#PositionalField(initialPosition = 1, finalPosition = 26, paddingAlign = PaddingAlign.RIGHT, paddingCharacter = '0')
public String getTimeStamp() {
return timeStamp;
}
#Override
public String toString() {
try {
FFPojoHelper ffPojo = FFPojoHelper.getInstance();
return ffPojo.parseToText(this);
} catch (FFPojoException ex) {
trsLogger.error(ex.getMessage(), ex);
}
return null;
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Protobuf JsonFormater.printer convert long to String in Json - java

Related

Parquet Binary storing value in encoded format

How to encode a string to base64 encoding algorithm using PowerBuilder

How can I set pageToken to get item lists from Google Cloud Storage via Java SDK?

How to convert a String FIX message to FIX FIX50SP2 format using QuickFixJ

Is there an smart way to write a fixed length flat file?

Categories

Resources