error in dataSearch using berkley DB - java

I've created a database using Berkley DB that stores N records where a record is a key/value pair. I originally populated it with only 20 records. With 20 records I managed to do a Key Search, and a Data Search (where I search through the database record by record for a data value that matches the string data inputted by the user).
public String dataSearch (String dataInput) {
String foundKey = null;
String foundData = null;
Cursor cursor = null;
try {
cursor = myDb.openCursor(null, null);
DatabaseEntry theKey = new DatabaseEntry();
DatabaseEntry theData = new DatabaseEntry();
while (cursor.getNext(theKey, theData, LockMode.DEFAULT) == OperationStatus.SUCCESS) {
foundKey = new String(theKey.getData(), "UTF-8");
foundData = new String(theData.getData(), "UTF-8");
// this is to see each key - data - inputdata as I was having an issue
System.out.println("KEY: " + foundKey +
"\nDATA: " + foundData +
"\nINPUT_DATA: " + dataInput + "\n\n");
if (foundData.equals(dataInput)) {
System.out.println("-----------------------------------\n\n");
System.out.println("Found record: " + foundKey +
"\nwith data: " + foundData);
System.out.println("\n\n-----------------------------------");
}
}
/* I then close the cursor and catch exceptions and such */
this works fine when I have less than (or equal to) 20 records... but when I use a bigger number I seem to have some funny behaviour. I set the number of records to 1000... the last key/data values to be inserted into the database are:
KEY: zghxnbujnsztazmnrmrlhjsjfeexohxqotjafliiktlptsquncuejcrebaohblfsqazznheurdqbqbxjmyqr
DATA: jzpqaymwwnoqzvxykowdhxvfbuhrsfojivugrmvmybbvurxmdvmrclalzfscmeknyzkqmrcflzdooyupwznvxikermrbicapynwspbbritjyeltywmmslpeuzsmh
I had it print out the last values to be inserted into the database then did a key search on the above key to ensure that the data above was infact the data associated with that key in the database. However, when I do a data search on the data listed above I get no found matching record (whereas the same process found a record when there was 20 records). I looked into it a bit more and got each my data search to print each key/data pair that it returned and found the following result:
KEY: zghxnbujnsztazmnrmrlhjsjfeexohxqotjafliiktlptsquncuejcrebaohblfsqazznheurdqbqbxjmyqrpzlyvnmdlvgyvzhbceeftcqssbeckxkuepxyphsgdzd
DATA: jzpqaymwwnoqzvxykowdhxvfbuhrsfojivugrmvmybbvurxmdvmrclalzfscmeknyzkqmrcflzdooyupwznvxikermrbicapynwspbbritjyeltywmmslpeuzsmhozy
INPUT DATA: jzpqaymwwnoqzvxykowdhxvfbuhrsfojivugrmvmybbvurxmdvmrclalzfscmeknyzkqmrcflzdooyupwznvxikermrbicapynwspbbritjyeltywmmslpeuzsmh
as you can see it seems to have randomly appended some extra bytes to the data value. however if I do a key search these extra bytes don't show up. So I think the problem is in the dataSearch function. The same results occur if I use b+tree or hash.
Any Ideas?
Thanks

After a long time looking at this I realized my error was that I was not reinitializing the theKey & theData variables.
the fix is in the while loop
while (cursor.getNext(theKey, theData, LockMode.DEFAULT) == OperationStatus.SUCCESS) {
foundKey = new String(theKey.getData(), "UTF-8");
foundData = new String(theData.getData(), "UTF-8");
// this is to see each key - data - inputdata as I was having an issue
System.out.println("KEY: " + foundKey +
"\nDATA: " + foundData +
"\nINPUT_DATA: " + dataInput + "\n\n");
if (foundData.equals(dataInput)) {
System.out.println("-----------------------------------\n\n");
System.out.println("Found record: " + foundKey +
"\nwith data: " + foundData);
System.out.println("\n\n-----------------------------------");
}
// THIS IS THE FIX
theKey = new DatabaseEntry();
theData = new DatabaseEntry();
// ----------------------------
}

Related

DynamoDB scan seems to not do anything

Basically, trying to scan a DynamoDB table to correct a data problem (which is why it is a scan):
var tableName = dynamoDbMapperProvider.getTableName(DynamoSnapshot.class, address);
log("Starting re-encryption for table " + tableName);
int migratedSnapshotsCount = 0;
for (var snapshot : dynamoDBMapper.scan(DynamoSnapshot.class, snapshotScanExpression)) {
if (snapshot.getEncryptedKey() != null) {
try {
encryptionService.decrypt(
EnvelopeMessage.of(snapshot.getBinaryData(), snapshot.getEncryptedKey(), snapshot.getCmkId())
);
// decryption okay so just set Cmk to configured value as a side-effect...
log("Updating CmkId for " + snapshot.getRootId());
dynamoDBMapper.save(snapshot.withCmkId(cmkId));
} catch (EncryptDecryptException ex) {
if (ex.isKmsError()) {
// ... reencrypt and log...
migratedSnapshotsCount++;
}
}
}
}
log(
"Re-encrypt encrypted key process completed for table " + tableName + ", " + migratedSnapshotsCount +
" Snapshots migrated and re-encrypted."
);
This is running in the cloud so I cannot debug this locally. I would expect at least 2 log messages, but we're only getting the 1st from before the for-each loop and then nothing, no exceptions, just nothing. How is that even possible?
Scan is lazily loaded by default. Try setting the scan to EAGER_LOADING so that the data is returned on execution.
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/dynamodbv2/datamodeling/DynamoDBMapperConfig.PaginationLoadingStrategy.html

OpenNLP classifier output

At the moment I'm using the following code to train a classifier model :
final String iterations = "1000";
final String cutoff = "0";
InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/trainingSets/classifierA.txt"));
ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);
TrainingParameters params = new TrainingParameters();
params.put(TrainingParameters.ITERATIONS_PARAM, iterations);
params.put(TrainingParameters.CUTOFF_PARAM, cutoff);
params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);
DoccatModel model = DocumentCategorizerME.train("NL", sampleStream, params, new DoccatFactory());
OutputStream modelOut = new BufferedOutputStream(new FileOutputStream("src/main/resources/models/model.bin"));
model.serialize(modelOut);
return model;
This goes well and after every run I get the following output :
Indexing events with TwoPass using cutoff of 0
Computing event counts... done. 1474 events
Indexing... done.
Collecting events... Done indexing in 0,03 s.
Incorporating indexed data for training...
done.
Number of Event Tokens: 1474
Number of Outcomes: 2
Number of Predicates: 4149
Computing model parameters...
Stats: (998/1474) 0.6770691994572592
...done.
Could someone explain what this output means? And if it tells something about the accuracy?
Looking at the source, we can tell this output is done by NaiveBayesTrainer::trainModel method:
public AbstractModel trainModel(DataIndexer di) {
// ...
display("done.\n");
display("\tNumber of Event Tokens: " + numUniqueEvents + "\n");
display("\t Number of Outcomes: " + numOutcomes + "\n");
display("\t Number of Predicates: " + numPreds + "\n");
display("Computing model parameters...\n");
MutableContext[] finalParameters = findParameters();
display("...done.\n");
// ...
}
If you take a look at findParameters() code, you'll notice that it calls the trainingStats() method, which contains the code snippet that calculates the accuracy:
private double trainingStats(EvalParameters evalParams) {
// ...
double trainingAccuracy = (double) numCorrect / numEvents;
display("Stats: (" + numCorrect + "/" + numEvents + ") " + trainingAccuracy + "\n");
return trainingAccuracy;
}
TL;DR the Stats: (998/1474) 0.6770691994572592 part of the output is the accuracy you're looking for.

AWS QueryRequest not returning any values from secondary index query

So I am trying to query one of my dynamoDB tables and for some reason it is not returning any results. I have another user table that has almost the exact same index and it returns a value. Here is my code:
String fbId = requestInfo.get("requestFbId");
System.out.println("The id is " + fbId);
Map<String, AttributeValue> exAttributeVal = new HashMap<String, AttributeValue>();
exAttributeVal.put(":val", new AttributeValue().withS(fbId));
QueryRequest friendsQuery = new QueryRequest()
.withTableName(Keys.friendsTable)
.withIndexName("User-Friends-index")
.withKeyConditionExpression("userId = :val")
.withExpressionAttributeValues(exAttributeVal);
QueryResult friendsQueryResult = dynamoDbClient.query(friendsQuery);
System.out.println("The size is " + friendsQueryResult.getItems().size());
for (int i = 0; i < friendsQueryResult.getItems().size(); i++) {
System.out.println("The result is " + friendsQueryResult.getItems().get(i));
}
Does anybody know what I could be doing wrong here? This also used to work directly on my android app, but it is not working now that I have moved it into lambda

Create Custom InputFormat of ColumnFamilyInputFormat for cassandra

I am working on a project, using cassandra 1.2, hadoop 1.2
I have created my normal cassandra mapper and reducer, but I want to create my own Input format class, which will read the records from cassandra, and I'll get the desired column's value, by splitting that value using splitting and indexing ,
so, I planned to create custom Format class. but I'm confused and not able to know, how would I make it? What classes are to be extend and implement, and how I will able to fetch the row key, column name, columns value etc.
I have my Mapperclass as follow:
public class MyMapper extends
Mapper<ByteBuffer, SortedMap<ByteBuffer, IColumn>, Text, Text> {
private Text word = new Text();
MyJDBC db = new MyJDBC();
public void map(ByteBuffer key, SortedMap<ByteBuffer, IColumn> columns,
Context context) throws IOException, InterruptedException {
long std_id = Long.parseLong(ByteBufferUtil.string(key));
long newSavePoint = 0;
if (columns.values().isEmpty()) {
System.out.println("EMPTY ITERATOR");
sb.append("column_N/A" + ":" + "N/A" + " , ");
} else {
for (IColumn cell : columns.values()) {
name = ByteBufferUtil.string(cell.name());
String value = null;
if (name.contains("int")) {
value = String.valueOf(ByteBufferUtil.toInt(cell.value()));
} else {
value = ByteBufferUtil.string(cell.value());
}
String[] data = value.toString().split(",");
// if (data[0].equalsIgnoreCase("login")) {
Long[] dif = getDateDiffe(d1, d2);
// logics i want to perform inside my custominput class , rather here, i just want a simple mapper class
if (condition1 && condition2) {
myhits++;
sb.append(":\t " + data[0] + " " + data[2] + " "+ data[1] /* + " " + data[3] */+ "\n");
newSavePoint = d2;
}
}
sb.append("~" + like + "~" + newSavePoint + "~");
word.set(sb.toString().replace("\t", ""));
}
db.setInterval(Long.parseLong(ByteBufferUtil.string(key)), newSavePoint);
db.setHits(Long.parseLong(ByteBufferUtil.string(key)), like + "");
context.write(new Text(ByteBufferUtil.string(key)), word);
}
I want to decrease my Mapper Class logics, and want to perform same calculations on my custom input class.
Please help, i wish for the positive r4esponse from stackies...
You can do the intended task by moving the Mapper logic to your custom input class (as you have indicated already)
I found this nice post which explains a similar problem statement as you have. I think it might solve your problem.

How Get Spooled File list separately from it's format - Java ( JT400 )

I get Spooled list to java using jt400. but i want to get Advanced Spooled file( *.TIFF image formatted Spooled files) list and normal Spooled (Can read Text) file list separately. Anyone know how to do that ?
Thanks in Advance!
try{
AS400 server = new AS400();
System.out.println(" Now receiving all spooled files Synchronously");
SpooledFileList splfList = new SpooledFileList( server );
// set filters, all users, on all queues
splfList.setUserFilter("user");
splfList.setQueueFilter("/QSYS.LIB/%ALL%.LIB/%ALL%.OUTQ");
// open list, openSynchronously() returns when the list is completed.
splfList.openSynchronously();
// Enumeration enum = splfList.getObjects();
Enumeration enumx = splfList.getObjects();
while(enumx.hasMoreElements())
{
SpooledFile splf = (SpooledFile)enumx.nextElement();
if ( splf != null )
{
String Name = splf.getName();
int Number = splf.getNumber();
String jobname = splf.getJobName();
String jobuser = splf.getJobUser();
String jobnumber = splf.getJobNumber();
// strSpooledNumber = splf.getStringAttribute(SpooledFile.)
System.out.println(" spooled file = Name :" + Name + " number : " + Number + " JobName : " + jobname + " job user : " + jobuser + " job Number : " + jobnumber);
}
}
// clean up after we are done with the list
splfList.close();
}
catch( Exception e )
{
e.printStackTrace();
}
The existing class doesn't have a filter on printer device type, although you could add one using getUserFilter as an example.
Once you have the full list of spooled files, you could split them yourself into two groups. Try String prtdevtype = splf.getStringAttribute(ATTR_PRTDEVTYPE);
From this you can tell if you have a text spooled file (*SCS) or one with graphics in it (*IPDS, *AFPDS).

Categories

Resources