How to edit a entry sequenced enscribe file - java

I need some help with this problem. It looks stupid but i could not resolved it. I have a entry sequenced file with variable length records. I only need to replace the first 3 bytes for XXX so i have to rebuild the whole file for this.. The problem i am getting is i am changing the length of all records filling with "NULLS". That's why i have no way to know previously the amount of bytes written for the record.
For example I have this file with three records:
AAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBBB
CCCCC
DDDDDDDDDDDDDD
The file has a REC attribute of 26 (equals to the length of the second record). When I execute my program to change the first three letters, the file remains that (assume "n" as "null character"):
AAAAAAAAAAAAAAAANNNNNNNNNN
BBBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCNNNNNNNNNNNNNNNNNNNNN
DDDDDDDDDDDDDDNNNNNNNNNNNN
How can i change my program to get what i want?
XXXAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBBB
CCCCC
DDDDDDDDDDDDDD
This is my code (java)
EnscribeFile p_origin = new EnscribeFile(file);
String first_record;
byte buffer[];
//First, charge all records and then purge the file content
ArrayList<byte[]> records = new ArrayList<byte[]>();
buffer = new byte[et.getRecordLength()];
p_origin.open(EnscribeOpenOptions.READ_WRITE,EnscribeOpenOptions.SHARED);
EnscribeFileAttributes et = p_origin.getFileInfo();
while ( p_origin.read(buffer,et.getRecordLength()) != EnscribeFile.POSITION_UNUSED )
{
byte auxRecord[] = new byte[et.getRecordLength()];
System.arraycopy(buffer,0,auxRecord,0,et.getRecordLength());
buffer = new byte[et.getRecordLength()];
records.add(auxRecord);
}
p_origin.purgeData();
//Second, modify first record
first_record = new String(records.get(0));
first_record = "XXX" + first_record.substring(3);
records.set(0,first_record.getBytes());
//Third, rewrite the records and close the file
Iterator<byte[]> i = records.iterator();
while( i.hasNext() )
p_origin.write(aux,et.getRecordLength()); //Check the note
p_origin.close();
Note: I can not add a function to get the last character before the first null before write becouse a previous null or nulls at the end of records could be possible and acceptable. Example (remember "N" is "null"):
AAAAAAAAAAAAAAAANN
BBBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCNN
DDDDDDDDDDDDDDNN
Must equal to this after the process:
XXXAAAAAAAAAAAAANN
BBBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCNN
DDDDDDDDDDDDDDNN

Ok, I found the solution at other forum. It is very simple. This method
p_origin.read(...)
returns the length of bytes that i did not know, so it is very simple save a variable the length before creating the new record. With some changes the code becomes:
EnscribeFile p_origin = new EnscribeFile(file);
String first_record;
byte buffer[];
//First, charge all records and then purge the file content
ArrayList<byte[]> records = new ArrayList<byte[]>();
buffer = new byte[et.getRecordLength()];
p_origin.open(EnscribeOpenOptions.READ_WRITE,EnscribeOpenOptions.SHARED);
EnscribeFileAttributes et = p_origin.getFileInfo();
int aux_len = p_origin.read(buffer,et.getRecordLength());
while ( aux_len != EnscribeFile.POSITION_UNUSED )
{
byte auxRecord[] = new byte[aux_len];
System.arraycopy(buffer,0,auxRecord,0,et.getRecordLength());
records.add(auxRecord);
aux_len = p_origin.read(buffer,et.getRecordLength());
}
p_origin.purgeData();
//Second, modify first record
first_record = new String(records.get(0));
first_record = "XXX" + first_record.substring(3);
records.set(0,first_record.getBytes());
//Third, rewrite the records and close the file
Iterator<byte[]> i = records.iterator();
while( i.hasNext() )
{
byte aux_byte[] = i.next();
p_origin.write(aux_byte,aux_byte.length);
}
p_origin.close();

Related

How do I save a retrieve specific data from a csv file without headers in java?

I am writing an application which needs to load a large csv file that is pure data and doesn't contain any headers.
I am using a fastCSV library to parse the file, however the data needs to be stored and specific fields need to be retrieved. Since the entire data is not necessary I am skipping every third line.
Is there a way to set the headers after the file has been parsed and save it in a data structure such as an ArrayList?
Here is the function which loads the file:
public void fastCsv(String filePath) {
File file = new File(filePath);
CsvReader csvReader = new CsvReader();
int linecounter = 1;
try (CsvParser csvParser = csvReader.parse(file, StandardCharsets.UTF_8)) {
CsvRow row;
while ((row = csvParser.nextRow()) != null) {
if ((linecounter % 3) > 0 ) {
// System.out.println("Read line: " + row);
//System.out.println("First column of line: " + row.getField(0));
System.out.println(row);
}
linecounter ++;
}
System.out.println("Execution Time in ms: " + elapsedTime);
csvParser.close();
} catch (IOException e) {
e.printStackTrace();
}
}
Any insight would be greatly appreciated.
univocity-parsers supports field selection and can do this very easily. It's also faster than the library you are using.
Here's how you can use it to select columns of interest:
Input
String input = "X, X2, Symbol, Date, Open, High, Low, Close, Volume\n" +
" 5, 9, AAPL, 01-Jan-2015, 110.38, 110.38, 110.38, 110.38, 0\n" +
" 2710, 289, AAPL, 01-Jan-2015, 110.38, 110.38, 110.38, 110.38, 0\n" +
" 5415, 6500, AAPL, 02-Jan-2015, 111.39, 111.44, 107.35, 109.33, 53204600";
Configure
CsvParserSettings settings = new CsvParserSettings(); //many options here, check the tutorial
settings.setHeaderExtractionEnabled(true); //tells the parser to use the first row as the header row
settings.selectFields("X", "X2"); //selects the fields
Parse and print results
CsvParser parser = new CsvParser(settings);
for(String[] row : parser.iterate(new StringReader(input))){
System.out.println(Arrays.toString(row));
}
}
Output
[5, 9]
[2710, 289]
[5415, 6500]
On the field selection, you can use any sequence of fields, and have rows with different column sizes, and the parser will handle this just fine. No need to write complex logic to handle that.
The process the File in your code, change the example above to do this:
for(String[] row : parser.iterate(new File(filePath))){
... //your logic goes here.
}
If you want a more usable record (with typed values), use this instead:
for(Record record : parser.iterateRecords(new File(filePath))){
... //your logic goes here.
}
Speeding up
The fastest way of processing the file is through a RowProcessor. That's a callback that received the rows parsed from the input:
settings.setProcessor(new AbstractRowProcessor() {
#Override
public void rowProcessed(String[] row, ParsingContext context) {
System.out.println(Arrays.toString(row));
context.skipLines(3); //use the context object to control the parser
}
});
CsvParser parser = new CsvParser(settings);
//`parse` doesn't return anything. Rows go to the `rowProcessed` method.
parser.parse(new StringReader(input));
You should be able to parse very large files pretty quickly. If things are slowing down look in your code (avoid adding values to lists or collections in memory, or at least pre-allocate the collections to a good size, and give the JVM a large amount of memory to work with using Xms and Xmx flags).
Right now this parser is the fastest you can find. I made this performance comparison a while ago you can use for reference.
Hope this helps
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license)
Do you know which fields/columns you want to keep, and what you'd like the "header" value to be ? , ie you want columns the first and third columns and you want them called "first" and "third" ? If so, you could build a HashMap of string/objects (or other appropriate type, depends on your actual data and needs), and add the HashMap to an ArrayList - this should get you going, just be sure to change the HashMap types as needed
ArrayList<HashMap<String,String>> arr=new ArrayList<>();
HashMap<String,String> hm=new HashMap<>();
while ((row = csvParser.nextRow()) != null) {
if ((linecounter % 3) > 0 ) {
// System.out.println("Read line: " + row);
//System.out.println("First column of line: " + row.getField(0));
// keep col1 and col3
hm.clear();
hm.put("first",row.getField(0));
hm.put("third",row.getField(2));
arr.add(hm);
}
linecounter ++;
}
If you want to capture all columns, you can use a similar technique but I'd build a mapping data structure so that you can match field indexes to column header names in a loop to add each column to the HashMap that is then stored in the ArrayList

How can I read user data (memory) from EPC RFID tag through LLRP?

I encode two EPC tags through "NiceLabel Pro" with data:
First tag: EPC: 555555555, UserData: 9876543210123456789
Second tag: EPC: 444444444, UserData: 123456789123456789
Now I'm trying to get that data through LLRP (in my Java application):
My LLRPClient (one function):
public void PrepareInventoryRequest() {
AccessCommand accessCommand = new AccessCommand();
// A list to hold the op specs for this access command.
accessCommand.setAccessCommandOpSpecList(GenerateOpSpecList());
// Create a new tag spec.
C1G2TagSpec tagSpec = new C1G2TagSpec();
C1G2TargetTag targetTag = new C1G2TargetTag();
targetTag.setMatch(new Bit(1));
// We want to check memory bank 1 (the EPC memory bank).
TwoBitField memBank = new TwoBitField("2");
targetTag.setMB(memBank);
// The EPC data starts at offset 0x20.
// Start reading or writing from there.
targetTag.setPointer(new UnsignedShort(0));
// This is the mask we'll use to compare the EPC.
// We want to match all bits of the EPC, so all mask bits are set.
BitArray_HEX tagMask = new BitArray_HEX("00");
targetTag.setTagMask(tagMask);
// We only only to operate on tags with this EPC.
BitArray_HEX tagData = new BitArray_HEX("00");
targetTag.setTagData(tagData);
// Add a list of target tags to the tag spec.
List <C1G2TargetTag> targetTagList =
new ArrayList<>();
targetTagList.add(targetTag);
tagSpec.setC1G2TargetTagList(targetTagList);
// Add the tag spec to the access command.
accessCommand.setAirProtocolTagSpec(tagSpec);
accessSpec.setAccessCommand(accessCommand);
...
private List<AccessCommandOpSpec> GenerateOpSpecList() {
// A list to hold the op specs for this access command.
List <AccessCommandOpSpec> opSpecList =
new ArrayList<>();
// Set default opspec which for eventcycle of accessspec 3.
C1G2Read opSpec1 = new C1G2Read();
// Set the OpSpecID to a unique number.
opSpec1.setOpSpecID(new UnsignedShort(1));
opSpec1.setAccessPassword(new UnsignedInteger(0));
// We'll read from user memory (bank 3).
TwoBitField opMemBank = new TwoBitField("3");
opSpec1.setMB(opMemBank);
// We'll read from the base of this memory bank (0x00).
opSpec1.setWordPointer(new UnsignedShort(0));
// Read two words.
opSpec1.setWordCount(new UnsignedShort(0));
opSpecList.add(opSpec1);
return opSpecList;
}
My tag handler function:
private void updateTable(TagReportData tag) {
if (tag != null) {
EPCParameter epcParam = tag.getEPCParameter();
String EPCStr;
List<AccessCommandOpSpecResult> accessResultList = tag.getAccessCommandOpSpecResultList();
for (AccessCommandOpSpecResult accessResult : accessResultList) {
if (accessResult instanceof C1G2ReadOpSpecResult) {
C1G2ReadOpSpecResult op = (C1G2ReadOpSpecResult) accessResult;
if ((op.getResult().intValue() == C1G2ReadResultType.Success) &&
(op.getOpSpecID().intValue() < 1000)) {
UnsignedShortArray_HEX userMemoryHex = op.getReadData();
System.out.println("User Memory read from the tag is = " + userMemoryHex.toString());
}
}
}
...
For the first tag, "userMemoryHex.toString()" = "3938 3736"
For the second tag, "userMemoryHex.toString()" = "3132 3334"
Why? How do I get all user data?
This is my rfid tag.
The values that you get seem to be the first 4 characters of the number (interpreted as an ASCII string):
39383736 = "9876" (when interpreting those 4 bytes as ASCII characters)
31323334 = "1234" (when interpreting those 4 bytes as ASCII characters)
Since the specification of your tag says
Memory: EPC 128 bits, User 32 bits
your tag can only contain 32 bits (= 4 bytes) of user data. Hence, your tag simply can't contain the full value (i.e. 9876543210123456789 or 123456789123456789) that you tried to write as UserData (regardless of whether this was interpreted as a decimal number or a string).
Instead, your writer application seems to have taken the first 4 characters of those values, encoded them in ASCII, and wrote them to the tag.

How to export repeat grid layout data to Excel using pzRDExportWrapper in Pega 7.1.8?

I am trying to export repeat grid data to excel. To do this, I have provided a button which runs "MyCustomActivity" activity via clicking. The button is placed above the grid in the same layout. It also worth pointing out that I am utulizing an article as a guide to configure. According to the guide my "MyCustomActivity" activity contains two steps:
Method: Property-Set, Method Parameters: Param.exportmode = "excel"
Method: Call pzRDExportWrapper. And I pass current parameters (There is only one from the 1st step).
But after I had got an issue I have changed the 2nd step by Call Rule-Obj-Report-Definition.pzRDExportWrapper
But as you have already understood the solution doesn't work. I have checked the log files and found interesting error:
2017-04-11 21:08:27,992 [ WebContainer : 4] [OpenPortal] [ ] [ MyFW:01.01.02] (ctionWrapper._baseclass.Action) ERROR as1|172.22.254.110 bar - Activity 'MyCustomActivity' failed to execute; Failed to find a 'RULE-OBJ-ACTIVITY' with the name 'PZRESOLVECOPYFILTERS' that applies to 'COM-FW-MyFW-Work'. There were 3 rules with this name in the rulebase, but none matched this request. The 3 rules named 'PZRESOLVECOPYFILTERS' defined in the rulebase are:
2017-04-11 21:08:42,807 [ WebContainer : 4] [TABTHREAD1] [ ] [ MyFW:01.01.02] (fileSetup.Code_Security.Action) ERROR as1|172.22.254.110 bar - External authentication failed:
If someone have any suggestions and share some, I will appreciate it.
Thank you.
I wanted to provide a functionality of exporting retrieved works to a CSV file. The functionality should has a feature to choose fields to retrieve, all results should be in Ukrainian and be able to use any SearchFilter Pages and Report Definition rules.
At a User Portal I have two sections: the first section contains text fields and a Search button, and a section with a Repeat Grid to display results. The textfields are used to filter results and they use a page Org-Div-Work-SearchFilter.
I made a custom parser to csv. I created two activities and wrote some Java code. I should mention that I took some code from the pzPDExportWrapper.
The activities are:
ExportToCSV - takes parameters from users, gets data, invokes the ConvertResultsToCSV;
ConvertResultsToCSV - converts retrieved data to a .CSV file.
Configurations of the ExportToCSV activity:
The Pages And Classes tab:
ReportDefinition is an object of a certain Report Definition.
SearchFilter is a Page with values inputted by user.
ReportDefinitionResults is a list of retrieved works to export.
ReportDefinitionResults.pxResults denotes a type of a certain work.
The Parameters tab:
FileName is a name of a generated file
ColumnsNames names of columns separated by comma. If the parameter is empty then CSVProperties is exported.
CSVProperties is a props to display in a spreadsheet separated by comma.
SearchPageName is a name of a page to filter results.
ReportDefinitionName is a RD's name used to retrieve results.
ReportDefinitionClass is a class of utilized report definition.
The Step tab:
Lets look through the steps:
1. Get an SearchFilte Page with a name from a Parameter with populated fields:
2. If SearchFilter is not Empty, call a Data Transform to convert SearchFilter's properties to Paramemer properties:
A fragment of the data Transform:
3. Gets an object of a Report Definition
4. Set parameters for the Report Definition
5. Invoke the Report Definition and save results to ReportDefinitionResults:
6. Invoke the ConvertResultsToCSV activity:
7. Delete the result page:
The overview of the ConvertResultsToCSV activity.
The Parameters tab if the ConvertResultsToCSV activity:
CSVProperties are the properties to retrieve and export.
ColumnsNames are names of columns to display.
PageListProperty a name of the property to be read in the primay page
FileName the name of generated file. Can be empty.
AppendTimeStampToFileName - if true, a time of the file generation.
CSVString a string of generated CSV to be saved to a file.
FileName a name of a file.
listSeperator is always a semicolon to separate fields.
Lets skim all the steps in the activity:
Get a localization from user settings (commented):
In theory it is able to support a localization in many languages.
Set always "uk" (Ukrainian) localization.
Get a separator according to localization. It is always a semicolon in Ukrainian, English and Russian. It is required to check in other languages.
The step contains Java code, which form a CSV string:
StringBuffer csvContent = new StringBuffer(); // a content of buffer
String pageListProp = tools.getParamValue("PageListProperty");
ClipboardProperty resultsProp = myStepPage.getProperty(pageListProp);
// fill the properties names list
java.util.List<String> propertiesNames = new java.util.LinkedList<String>(); // names of properties which values display in csv
String csvProps = tools.getParamValue("CSVProperties");
propertiesNames = java.util.Arrays.asList(csvProps.split(","));
// get user's colums names
java.util.List<String> columnsNames = new java.util.LinkedList<String>();
String CSVDisplayProps = tools.getParamValue("ColumnsNames");
if (!CSVDisplayProps.isEmpty()) {
columnsNames = java.util.Arrays.asList(CSVDisplayProps.split(","));
} else {
columnsNames.addAll(propertiesNames);
}
// add columns to csv file
Iterator columnsIter = columnsNames.iterator();
while (columnsIter.hasNext()) {
csvContent.append(columnsIter.next().toString());
if (columnsIter.hasNext()){
csvContent.append(listSeperator); // listSeperator - local variable
}
}
csvContent.append("\r");
for (int i = 1; i <= resultsProp.size(); i++) {
ClipboardPage propPage = resultsProp.getPageValue(i);
Iterator iterator = propertiesNames.iterator();
int propTypeIndex = 0;
while (iterator.hasNext()) {
ClipboardProperty clipProp = propPage.getIfPresent((iterator.next()).toString());
String propValue = "";
if(clipProp != null && !clipProp.isEmpty()) {
char propType = clipProp.getType();
propValue = clipProp.getStringValue();
if (propType == ImmutablePropertyInfo.TYPE_DATE) {
DateTimeUtils dtu = ThreadContainer.get().getDateTimeUtils();
long mills = dtu.parseDateString(propValue);
java.util.Date date = new Date(mills);
String sdate = dtu.formatDateTimeStamp(date);
propValue = dtu.formatDateTime(sdate, "dd.MM.yyyy", "", "");
}
else if (propType == ImmutablePropertyInfo.TYPE_DATETIME) {
DateTimeUtils dtu = ThreadContainer.get().getDateTimeUtils();
propValue = dtu.formatDateTime(propValue, "dd.MM.yyyy HH:mm", "", "");
}
else if ((propType == ImmutablePropertyInfo.TYPE_DECIMAL)) {
propValue = PRNumberFormat.format(localeCode,PRNumberFormat.DEFAULT_DECIMAL, false, null, new BigDecimal(propValue));
}
else if (propType == ImmutablePropertyInfo.TYPE_DOUBLE) {
propValue = PRNumberFormat.format(localeCode,PRNumberFormat.DEFAULT_DECIMAL, false, null, Double.parseDouble(propValue));
}
else if (propType == ImmutablePropertyInfo.TYPE_TEXT) {
propValue = clipProp.getLocalizedText();
}
else if (propType == ImmutablePropertyInfo.TYPE_INTEGER) {
Integer intPropValue = Integer.parseInt(propValue);
if (intPropValue < 0) {
propValue = new String();
}
}
}
if(propValue.contains(listSeperator)){
csvContent.append("\""+propValue+"\"");
} else {
csvContent.append(propValue);
}
if(iterator.hasNext()){
csvContent.append(listSeperator);
}
propTypeIndex++;
}
csvContent.append("\r");
}
CSVString = csvContent.toString();
5. This step forms and save a file in server's catalog tree
char sep = PRFile.separatorChar;
String exportPath= tools.getProperty("pxProcess.pxServiceExportPath").getStringValue();
DateTimeUtils dtu = ThreadContainer.get().getDateTimeUtils();
String fileNameParam = tools.getParamValue("FileName");
if(fileNameParam.equals("")){
fileNameParam = "RecordsToCSV";
}
//append a time stamp
Boolean appendTimeStamp = tools.getParamAsBoolean(ImmutablePropertyInfo.TYPE_TRUEFALSE,"AppendTimeStampToFileName");
FileName += fileNameParam;
if(appendTimeStamp) {
FileName += "_";
String currentDateTime = dtu.getCurrentTimeStamp();
currentDateTime = dtu.formatDateTime(currentDateTime, "HH-mm-ss_dd.MM.yyyy", "", "");
FileName += currentDateTime;
}
//append a file format
FileName += ".csv";
String strSQLfullPath = exportPath + sep + FileName;
PRFile f = new PRFile(strSQLfullPath);
PROutputStream stream = null;
PRWriter out = null;
try {
// Create file
stream = new PROutputStream(f);
out = new PRWriter(stream, "UTF-8");
// Bug with Excel reading a file starting with 'ID' as SYLK file. If CSV starts with ID, prepend an empty space.
if(CSVString.startsWith("ID")){
CSVString=" "+CSVString;
}
out.write(CSVString);
} catch (Exception e) {
oLog.error("Error writing csv file: " + e.getMessage());
} finally {
try {
// Close the output stream
out.close();
} catch (Exception e) {
oLog.error("Error of closing a file stream: " + e.getMessage());
}
}
The last step calls #baseclass.DownloadFile to download the file:
Finally, we can post a button on some section or somewhere else and set up an Actions tab like this:
It also works fine inside "Refresh Section" action.
A possible result could be
Thanks for reading.

RIAK high diskspace usage

I am evaluating RIAK kV V2.1.1 on a local desktop using java client and a little customised version of the sample code
And my concern is I found it to be taking almost 920bytes per KV.
That's too steep. The data dir was 93 mb for 100k kvs and kept increasing linearly there after for every 100k Store ops.
Is that expected.
RiakCluster cluster = setUpCluster();
RiakClient client = new RiakClient(cluster);
System.out.println("Client object successfully created");
Namespace quotesBucket = new Namespace("quotes2");
long start = System.currentTimeMillis();
for(int i=0; i< 100000; i++){
RiakObject quoteObject = new RiakObject().setContentType("text/plain").setValue(BinaryValue.create("You're dangerous, Maverick"));
Location quoteObjectLocation = new Location(quotesBucket, ("Ice"+i));
StoreValue storeOp = new StoreValue.Builder(quoteObject).withLocation(quoteObjectLocation).build();
StoreValue.Response storeOpResp = client.execute(storeOp);
}
There was a thread on the riak users mailing list a while back that discussed the overhead of the riak object, estimating it at ~400 bytes per object. However, that was before the new object format was introduced, so it is outdated. Here is a fresh look.
First we need a local client
(node1#127.0.0.1)1> {ok,C}=riak:local_client().
{ok,{riak_client,['node1#127.0.0.1',undefined]}}
Create a new riak object with a 0-byte value
(node1#127.0.0.1)2> Obj = riak_object:new(<<"size">>,<<"key">>,<<>>).
#r_object{bucket = <<"size">>,key = <<"key">>,
contents = [#r_content{metadata = {dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],[],[],[],[],[],[],...}}},
value = <<>>}],
vclock = [],
updatemetadata = {dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],...}}},
updatevalue = undefined}
The object is actually stored in a reduced binary format:
(node1#127.0.0.1)3> byte_size(riak_object:to_binary(v1,Obj)).
36
That is 36 bytes overhead for just the object, but that doesn't include the metadata like last updated time or the version vector, so store it in Riak and check again.
(node1#127.0.0.1)4> C:put(Obj).
ok
(node1#127.0.0.1)5> {ok,Obj1} = C:get(<<"size">>,<<"key">>).
{ok, #r_object{bucket = <<"size">>,key = <<"key">>,
contents = [#r_content{metadata = {dict,3,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],[],[],[],[],[[...]],[...],...}}},
value = <<>>}],
vclock = [{<<204,153,66,25,119,94,124,200,0,0,156,65>>,
{3,63654324108}}],
updatemetadata = {dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],...}}},
updatevalue = undefined}}
(node1#127.0.0.1)6> byte_size(riak_object:to_binary(v1,Obj)).
110
Now it is 110 bytes overhead for an empty object with a single entry in the version vector. If a subsequent put of the object is coordinated by a different vnode, it will add another entry. I've selected the bucket and key names so that the local node is not a member of the preflist, so the second put has a fair probability of being coordinated by a different node.
(node1#127.0.0.1)7> C:put(Obj1).
ok
(node1#127.0.0.1)8> {ok,Obj2} = C:get(<<"size">>,<<"key">>).
{ok, #r_object{bucket = <<"size">>,key = <<"key">>,
contents = [#r_content{metadata = {dict,3,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],[],[],[],[],[[...]],[...],...}}},
value = <<>>}],
vclock = [{<<204,153,66,25,119,94,124,200,0,0,156,65>>,
{3,63654324108}},
{<<85,123,36,24,254,22,162,159,0,0,78,33>>,{1,63654324651}}],
updatemetadata = {dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],...}}},
updatevalue = undefined}}
(node1#127.0.0.1)9> byte_size(riak_object:to_binary(v1,Obj2)).
141
Which is another 31 bytes added for an additional entry in the version vector.
These numbers don't include storing the actual bucket and key names with the value, or Bitcask storing them again in a hint file, so the actual space on disk would then be 2x(bucketname size + keyname size) + value overhead + file structure overhead + checksum/hash size
If you're using bitcask, there is a calculator in the documentation that will help you estimate disk and memory requirements: http://docs.basho.com/riak/kv/2.2.0/setup/planning/bitcask-capacity-calc/
If you use eLevelDB, you have the option of snappy compression which could reduce the size on disk.

parsing multiple lines with regex

I'm writing a program in Java that parse bibtex library file. each entry should be parsed to
field and value. this is an example of one single bibtex from a library.
#INPROCEEDINGS{conf/icsm/Ceccato07,
author = {Mariano Ceccato},
title = {Migrating Object Oriented code to Aspect Oriented Programming},
booktitle = {ICSM},
year = {2007},
pages = {497--498},
publisher = {IEEE},
bibdate = {2008-11-18},
bibsource = {DBLP, http://dblp.uni-trier.de/db/conf/icsm/icsm2007.html#Ceccato07},
crossref = {conf/icsm/2007},
owner = {Administrator},
timestamp = {2009.04.30},
url = {http://dx.doi.org/10.1109/ICSM.2007.4362668}
}
in this case, I just read the line and split it using the method split. for example, the first entry (author) is parsed like this:
Scanner in = new Scanner(new File(library.bib));
in.nextLine(); //skip the header
String input = in.nextLine(); //read (author = {Mariano Ceccato},)
String field = input.split("=")[0].trim(); //field = "author"
String value = input.split("=")[1]; //value = "{Mariano Ceccato},"
value = value.split("\\}")[0]; //value = "{Mariano Ceccato"
value = value.split("\\{")[1]; //value = "Mariano Ceccato"
value = value.trim; //remove any white spaces (if any)
up to know every thing is good. However there are a bibtex in the library that has multiple lines' value:
#ARTICLE{Aksit94AbstractingCF,
author = {Mehmet Aksit and Ken Wakita and Jan Bosch and Lodewijk Bergmans and
Akinori Yonezawa },
title = {{Abstracting Object Interactions Using Composition Filters}},
journal = {Lecture Notes in Computer Science},
year = {1994},
volume = {791},
pages = {152--??},
acknowledgement = {Nelson H. F. Beebe, Center for Scientific Computing, University of
Utah, Department of Mathematics, 110 LCB, 155 S 1400 E RM 233, Salt
Lake City, UT 84112-0090, USA, Tel: +1 801 581 5254, FAX: +1 801
581 4148, e-mail: \path|beebe#math.utah.edu|, \path|beebe#acm.org|,
\path|beebe#computer.org|, \path|beebe#ieee.org| (Internet), URL:
\path|http://www.math.utah.edu/~beebe/|},
bibdate = {Mon May 13 11:52:14 MDT 1996},
coden = {LNCSD9},
issn = {0302-9743},
owner = {aljasser},
timestamp = {2009.01.08}
}
as you see, the acknowledgement field it more than a line, so I can't read it using nextLine(). My parsing function works fine with it if I passed it as a String to it. So what is the best way to read this entry and other multiple lines entry and stile be able to read single line entries ?
The form of these entries is
#<type>{<Id>
<name>={<value>},
....
<name>={<value>}
}
Note that the last name-value pair is not followed by a comma.
If a value is split over several lines, then that simply means that a particular line does not yet contain the closing brace. In that case, scan the next line and append it to the string you are about to split. Keep doing this until the last characters in the string are "}," or "}" (this latter would happen if the 'acknowledgement' was the last name-value pair in the record).
For extra safety, count that the number of closing braces matches the number of opening braces, and keep appending lines to your string until it does. This would be to cover situations where you have a long title in an article that happened to unfortunately break at the wrong place, such as
title = {{Abstracting Object Interactions Using Composition Filters, and other stuff}
},
For these king of issues, it is always better to use a specific parser.
I googled for bibtex parser and find this.
If you like to have your own as what you are doing, one sulotion to this problem is to check whether
the line ends with }, if not append the current line with the next one.
Having said that, there might be other issues, that's why I suggested using a parser

Categories

Resources