Parse data from non structured file

Parse data from non structured file - java

Hi i would like to know how i can parse data from a non structured file. the data is like in a table but there is juste spaces. Here is an example :
DEST ULD ULD XXX NEXT NEXT XXX XXX
XXX/ XXX TYPE/ XXX XXX PCS WGT
BULK NBR SUBTYPE NBR DEST
XXX BULK BF 0
XXX BULK BB 39
XXX BULK BB 1
XXX BULK BF 0
XXX BULK BB 0
I can't use delimiter as useDelimiter("\\s{2,9"); because the spaces changes between column...
Any idea ?

What you have is called fixed-length format. In some ways it is easier. What's the best way of parsing a fixed-width formatted file in Java?

Related

Use APDU commands to get some information for a card

I have a terminal that has its own API to stablish and send commands between chip and terminal, there is a function that transmits the APDU command and returns the answer in a byte array.
For example, if a want to read the tag 5A (Application PAN), I send the following command:
byte[] byteArrayAPDU = new byte[]{(byte)0x00, (byte)0xCA, (byte)0x00, (byte)0x5A};
int nResult = SmartCardInterface.transmit(nCardHandle, byteArrayAPDU, byteArrayResponse);
The variable byteArrayResponse gets the response to the APDU command.
When I translate the value of byteArrayAPDU to a string of hexadecimal digits, this gives me: 00 CA 00 5A. And the response to that command is 6E 00 (class not supported).
My device works with ISO 7816 as technical specifications. Is the way in which I am sending APDU commands correct? I ask this because I have read that an APDU command must have 5 values at least, but I don't know what to send in the fifth parameter. I don't know what the lenght of the response is.
Can you give an example of how to get the tag 5A or something else in APDU commands?
If the command where correct, in place of where I see 6E 00 at the moment, would I see the information as plain text when cast to a string?

The input and output values that you showed in your question suggest that your use of the method transceive() is correct, i.e. the second argument is a command APDU and the third argument is filled with the response APDU:
resultCode = SmartCardInterface.transmit(cardHandle, commandAPDU, ResponseAPDU);
Your question regarding the format and validity of APDU commands is rather broad. In general, the format of APDUs and a basic set of commands is defined in ISO/IEC 7816-4. Since you tagged the question with emv and mention the application primary account number, you are probably interacting with some form of EMV payment card (e.g. a credit or debit card from one of the major schemes). In that case, you would probably want to study the various specifications for EMV payment systems which define the data structures and application-specific commands for those cards.
Regarding your specific questions:
Do APDUs always consist of at least 5 bytes?
No, certainly not. Command APDUs consist of at least 4 bytes (the header bytes). These are
+-----+-----+-----+-----+
| CLA | INS | P1 | P2 |
+-----+-----+-----+-----+
Such a 4-byte APDU is called "case 1". This means that the command APDU does not contain a data field sent to the card and that the card is not expected to generate a response data field. So the response APDU is expected to only contain a response status word:
+-----+-----+
| SW1 | SW2 |
+-----+-----+
What is the 5th byte of a command APDU?
The 5th byte is a length field (or part of a length field in case of extended length APDUs, which I won't further explain in this post). Depending on the case, this length field may have two meanings:
If the command APDU does not have a data field, that length field indicates the expected length (Ne) of the response data field:
+-----+-----+-----+-----+-----+
| CLA | INS | P1 | P2 | Le |
+-----+-----+-----+-----+-----+
Le = 0x01 .. 0xFF: This means that the expected response data length Ne is 1, 2, ... 255 bytes (i.e. exactly the value of Le).
Le = 0x00: This means that the expected response data length Ne is 256 bytes. This is typically used to instruct the card to give you as much bytes as it has available (up to 256 bytes). So even if Le is set to 0x00, you won't always get exactly 256 bytes from the card.
If the command APDU itself has a data field, that length field indicates the length (Nc) of the command data field:
+-----+-----+-----+-----+-----+-----------------+
| CLA | INS | P1 | P2 | Lc | DATA (Nc bytes) |
+-----+-----+-----+-----+-----+-----------------+
Lc = 0x01 .. 0xFF: This means that the command data length Nc is 1, 2, ... 255 bytes (i.e. exactly the value of Lc).
Lc = 0x00: This is used to indicate an extended length APDU.
If there is a command data field and the command is expected to generate response data, that command APDU may again be followed by an Le field:
+-----+-----+-----+-----+-----+-----------------+-----+
| CLA | INS | P1 | P2 | Lc | DATA (Nc bytes) | Le |
+-----+-----+-----+-----+-----+-----------------+-----+
Is the command 00 CA 00 5A correct?
Probably not, for several reasons:
Since you expect the card to deliver a response data field (i.e. the data object 0x5A), you need to specify an Le field. Hence, a valid format would be
+------+------+------+------+------+
| CLA | INS | P1 | P2 | Le |
+------+------+------+------+------+
| 0x00 | 0xCA | 0x00 | 0x5A | 0x00 |
+------+------+------+------+------+
You receive the status word 6E 00 in response to the command. The meaning of this status word is "class not supported". This indicates that commands with the CLA byte set to 0x00 are not supported in the current state. With some cards this also simply means that this combination of CLA and INS (00 CA) is not supported, eventhough this contradicts the definition in ISO/IEC 7816-4.
Overall, you can assume that your card does not support this command in its current execution state.
Assuming you are interacting with an EMV payment card, you typically need to select an application first. Your question does not indicate if you do this already, so I assume, you don't do this right now. Selecting an application is done by sending a SELECT (by AID) command:
+------+------+------+------+------+-----------------+------+
| CLA | INS | P1 | P2 | Le | DATA | Le |
+------+------+------+------+------+-----------------+------+
| 0x00 | 0xA4 | 0x04 | 0x00 | 0xXX | Application AID | 0x00 |
+------+------+------+------+------+-----------------+------+
The value of the application AID, of course, depends on the card application and may be obtained by following the discovery procedures defined in the EMV specifications.
Even after application selection, the GET DATA APDU command for EMV applications is defined in the proprietary class. Consequently, the CLA byte must be set to 0x80:
+------+------+------+------+------+
| CLA | INS | P1 | P2 | Le |
+------+------+------+------+------+
| 0x80 | 0xCA | 0x00 | 0x5A | 0x00 |
+------+------+------+------+------+
Finally, even then, I'm not aware of any schemes where cards would allow you to retrieve the PAN through a GET DATA command. Usually, the PAN is only accessible through file/record based access. Since you did not reveal the specific type/brand of your card, it's impossible to tell what your card may or may not actually support.

At Start
Standard ISO 7816 includes several parts.
When terminal device vendors noticed about ISO 7816 they just confirm that the common Physical characteristics (Part 1), Dimension and Contacts (Part 2) and Transmission protocol (Part 3) were applied to the device reader.
APDU commands and responses defined in ISO 7816 Part 4 (and few other parts also) are generic definition and might not fully supported by your smartcard.
You need to learn about the card-terminal interaction layers related to your card type:
EMV is the customized version of ISO 7816 for Payment cards.
Global Card Brands used own customized specifications based on EMV and ISO 7816. For sample Amex "AEIPS", Diners "D-PAS", MasterCard "M/Chip", Visa "VIS", etc. They are almost the same with small differences related to the supported Commands, flows and list of Tags.
Unfortunately most of payment cards are not supposed to return Tag 0x5A value with GET DATA APDU command. Usually you need to follow payment procedure. At least SELECT card application and READ Tag Values from SFI card records.
According to EMV GET DATA P1 P2 values should be used for Tags 0x9F36, 0x9F13, 0x9F17, or 0x9F4F.
Answering your questions:
What to send in the fifth parameter? What is the length of the response?
Fifth byte known as "Le" - Length of Expected Data. You can try to use Le = "00".
If APDU command supported by card you may get SW1SW2 as 0x"6Cxx" where xx is the hexadecimal length of the requested data. When you can repeat same command with correct Le value.
For sample, to read PIN Counter
Get Data (Tag = '9F 17')
Request : 80 CA 9F 17 00
Response: 6C 04
SW1 SW2: 6C 04 (SW_Warning Wrong length(Le))
Get Data (Tag = '9F 17')
Request : 80 CA 9F 17 04
Response: 9F 17 01 00 90 00
Data : 9F 17 01 03 // Tag + Length + Value
Tag 9F 17: Personal Identification Number (PIN) Try Counter : 03
SW1 SW2 : 90 00 (SW_OK)
If the command where satisfactory in place of see 6E 00 at the moment of cast the answer to string I would see the information as plain text?
APDU commands and responses used BYTE encoding. According to provided terminal API example you will get Array of Bytes.
As developer you can transform bytes into desired format or use it as-is. Please keep in mind that according to EMV specifications the formats of Tags data can be variable:
HEX (or binary) for sample for numeric Tags like amounts;
BCD for sample for date/time or some numbers like currency. PAN also BCD encoder;
Strings in different charsets (ASCII, Unicode, ...) for sample for Cardholder Name, Application Name.
etc.
Tag 0x5A - Application Primary Account Number (PAN) encoded as BCD and can be padded with 0xF in case odd PAN length.

Just answering to how READ your specific tag data since APDU and application State behavior is already answered.
After you SELECT application, you can initiate a GET PROCESSING OPTIONS. This is the actual start of the transaction. Here you will be returned a tag named AFL (application file locator). You need to parse this element and do multiple READ RECORDS till you find the data.
AFL is a set of four byte data( If you have two sets of SFI, there will be eight byte data).
First byte denote the SFI(5 most significant bytes is the input to P2
of READ RECORD). Second byte denotes the first record to read( input
to P1 of READ RECORD). Third byte denotes the last record to read.(
you need to loop READ RECORD this many times) The fourth byte donotes
the number of records involved in offline data authentication.
As you parse through, you will find the your required data. In case you are not sure how to parse, copy the hex data an try it here

DB2 UTF-8 XML C2 85 to new line conversion

We have problem when saving XML data ( UTF-8) encoded to DB2 9.7 LUW in table.
Table DDL:
CREATE TABLE DB2ADMIN.TABLE_FOR_XML
(
ID INTEGER NOT NULL,
XML_FIELD XML NOT NULL
)
Problem occurs in some rare examples with rare Unicode characters, we are using java jdbc db2 driver.
For example looking in editor in normal mode not in hex view (Notepad++) this strange A below (after 16.) is represented as NEL in blacks square
Input XML is in UTF-8 encoding and when looked in HEX editor has this values:
00000010h: 31 36 2E 20 C2 85 42 ; 16. Â…B
After inserting in DB2 I presume that some kind of conversion occurs because when selecting data back this same character are now
00000010h: 31 36 2E 20 0D 0A 42 ; 16. ..B
C2 85 is transformed into 0D 0A that is new line.
One another thing I noticed that although when saving XML into table header content was starting with
<xml version="1.0" encoding="UTF-8">
but after fetching xml from db2 content was starting with
<xml version="1.0" encoding="UTF-16">
Is there way to force db2 to store XML in UTF-8 without conversions ? Fetching with XMLSERIALIZE didn't help
SELECT XML_FIELD AS CONTENT1, XMLSERIALIZE(XML_FIELD as cLOB(1M)) AS CONTENT2 from DB2ADMIN.TABLE_FOR_XML
IN content2 there is no XML header but stile newLine is there.

This behaviour is standard for XML 1.1 processors. XML 1.1 s2.11:
the XML processor must behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating [the single character #x85] to a single #xA character
Line ending type is one of the many details of a document that will be lost over a parse-and-serialise cycle (eg attribute order, whitespace in tags, numeric character references...).
It's slightly surprising that DB2's XML fields are using XML 1.1 since not much uses that revision of XML, but not super-surprising in that support for NEL (ancient, useless mainframe line ending character) is something only IBM ever wanted.
Is there way to force db2 to store XML in UTF-8 without conversions ?
Use a BLOB?
If you need both native-XML-field functionality and to retain the exact original serialised form of a document then you'll need two columns.
(Are you sure you need to retain NEL line endings? Nobody usually cares about line endings, and these are pretty bogus.)

As I don't generally need non readable characters, before saving XML string to Db2 I decided to implement clean string from x'c285 (code point 133) and 4 byte UTF-8 characters just for the case:
Found similar example(How to replace/remove 4(+)-byte characters from a UTF-8 string in Java?) and adjusted it.
public static final String LAST_3_BYTE_UTF_CHAR = "\uFFFF";
public static final String REPLACEMENT_CHAR = "\uFFFD";
public static String toValid3ByteUTF8String(String line) {
final int length = line.length();
StringBuilder b = new StringBuilder(length);
for (int offset = 0; offset < length; ) {
final int codepoint = line.codePointAt(offset);
// do something with the codepoint
if (codepoint > LAST_3_BYTE_UTF_CHAR.codePointAt(0)) { //4-byte UTF replace
b.append(REPLACEMENT_CHAR);
} else if( codepoint == 133){ //NEL or x'c285
b.append(REPLACEMENT_CHAR);
} else {
if (Character.isValidCodePoint(codepoint)) {
b.appendCodePoint(codepoint);
} else {
b.append(REPLACEMENT_CHAR);
}
}
offset += Character.charCount(codepoint);
}
return b.toString();
}

Load Social Network Data into Neo4J

I have a dataset similar to Twitter's graph. The data is in the following form:
<user-id1> <List of ids which he follows separated by spaces>
<user-id2> <List of ids which he follows separated by spaces>
...
I want to model this in the form of a unidirectional graph, expressed in the cypher syntax as:
(A:Follower)-[:FOLLOWS]->(B:Followee)
The same user can appear more than once in the dataset as he might be in the friend list of more than one person, and he might also have his friend list as part of the data set. The challenge here is to make sure that there are no duplicate nodes for any user. And if the user appears as a Follower and Followee both in the data set, then the node's label should have both the values, i.e., Follower:Followee. There are about 980k nodes in the graph and size of dataset is 1.4 GB.
I am not sure if Cypher's load CSV will work here because each line of the dataset has a variable number of columns making it impossible to write a query to generate the nodes for each of the columns. So what would be the best way to import this data into Neo4j without creating any duplicates?

I did actually exactly the same for the friendster dataset, which has almost the same format as yours.
There the separator for the many friends was ":".
The queries I used there, are these:
create index on :User(id);
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///home/michael/import/friendster/friends-000______.txt" as line FIELDTERMINATOR ":"
MERGE (u1:User {id:line[0]})
;
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///home/michael/import/friendster/friends-000______.txt" as line FIELDTERMINATOR ":"
WITH line[1] as id2
WHERE id2 <> '' AND id2 <> 'private' AND id2 <> 'notfound'
UNWIND split(id2,",") as id
WITH distinct id
MERGE (:User {id:id})
;
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///home/michael/import/friendster/friends-000______.txt" as line FIELDTERMINATOR ":"
WITH line[0] as id1, line[1] as id2
WHERE id2 <> '' AND id2 <> 'private' AND id2 <> 'notfound'
MATCH (u1:User {id:id1})
UNWIND split(id2,",") as id
MATCH (u2:User {id:id})
CREATE (u1)-[:FRIEND_OF]->(u2)
;

Parsing a complicated CSV file

I am in the difficult situation now where i need to make a parser to parse a formatted document from tekla to be processed in the database.
so on the .CSV i have this
,SMS-PW-BM31,,1,,,,287.9
,,SMS-PW-BM31,1,H350*175*7*11,SS400,5805,287.9
,------------,--------------,----,---------------,--------,------------,---------
,SMS-PW-BM32,,1,,,,405.8
,,SMSPW-H707,1,H350*175*7*11,SS400,6697,332.2
,,SMSPW-EN12,1,PLT12x175,SS400,500,8.2
,,SMSPW-EN14,1,PLT16x175,SS400,500,11
,------------,--------------,----,---------------,--------,------------,---------
That is the document generated from the tekla software. What i expect from the output is something like this
HEAD-MARK COMPONENT-TYPE QUANTITY PROFILE GRADE LENGTH WEIGHT
SMS-PW-BM31 1 287.9
SMS-PW-BM31 SMS-PW-BM31 1 H350*175*7*11 SS400 5805 287.9
SMS-PW-BM32 1 405.8
SMS-PW-BM32 SMSPW-H707 1 H350*175*7*11 SS400 6697 332.2
SMS-PW-BM32 SMSPW-EN12 1 PLT12X175 SS400 500 8.2
SMS-PW-BM32 SMSPW-EN14 1 PLT16X175 SS400 500 11
How do i start from in Java ? the most complicated thing is distributing the head mark that separated by the '-'

CSV format is quite simple, there is a column delimiter that is a comma(,) and a row delimiter that is a new line(\n). Some columns will be surrounded by quotes(") to contain column data but it looks like you wont have to worry about that given your current file.
Look at String.split and you will find your answer after a bit of pondering it.

Removing spaces and saving as list of objects

I'm just wondering what procedure would be the most appropriate/most efficent way to remove spaces in a data file (.txt) and save the results as a list of objects?
Here is a snippet of the data:
2014-03-24 19:11:42.838 7611.668 UDP 192.168.0.15:5353 -> 224.0.0.251:5353 53 5353 12
2014-03-24 19:03:30.710 8061.709 UDP 192.168.0.12:137 -> 192.168.0.255:137 374 30432 9
2014-03-24 19:13:55.651 7246.821 UDP 192.168.0.21:1024 -> 255.255.255.255:1900 24 9640 8
Just looking to save them as a List of Flows

Just use:
String s = data.replaceAll("\\s", "");

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.