How to read and write the records in EhCache? - java

Hi All,
My current requirement is to store and read the records using EhCache. I am new to EhCache Implementation. I have read the EhCache Documentation and started to implement. I have done the records insert part and also read part. While the records are inserted, there will be *.data nd *.index files are created. Following is Code.
public class Driver
{
public static void main(String[] args) {
CacheManager cm = CacheManager.create("ehcache.xml");
Cache cache = cm.getCache("test");
// I do a couple of puts
for(int i=0;i<10;i++){
cache.put(new Element("key1", "val1"));
cache.flush();
}
System.out.println(cache.getKeys());
for(int i=0;i<10;i++){
Element el = cache.get("key"+i);
System.out.println(el.getObjectValue());
}
cm.shutdown();
}
}
Now what the issue is cm.shutdown(). If I am commenting this line and comment out the insert part and run the program means, Not able to retrieve the records and also *.index file is deleted. So In real scenario if the program is stopped abruptly means we can't read the records after startup. I want to know why the file is deleted and why I cant read the records in this situation... The Exception coming in the console is
net.sf.ehcache.util.SetAsList#b66cc
Exception in thread "main" java.lang.NullPointerException
at Driver.main(Driver.java:29)...
Any Input is needed Please..

What you are doing is correct and the expected behaviour is correct too. Caches are typically used to enhance application performance by providing frequently used data quickly, while avoiding costly trips to datastore.
Not all applications need to persist cache after the system is shutdown- and that's the default behaviour you are seeing (Most applications will build cache on application startup or as requests start coming in). The data you are caching is in heap - and as soon as your JVM dies- the cache is gone. Now you want to persist it beyond restart? There are options available. Loook up here
And I am copying the code snippet right from the same page:
DiskStoreConfiguration diskStoreConfiguration = new DiskStoreConfiguration();
diskStoreConfiguration.setPath("/my/path/dir");
// Already created a configuration object ...
configuration.addDiskStore(diskStoreConfiguration);
// By adding configuration for storing the cache in a file - you are not using default cache manager
CacheManager mgr = new CacheManager(configuration);
In addition, you will have to also configure the persistence options as explained here
Again copying code snippet from link:
<cache>
<persistence strategy=”localRestartable” synchronousWrites=”true”/>
</cache>
Hope this helps!

Related

Apache Solr filtering not working but possible to retrieve by id

Background:
We have a 3-node solr cloud that was migrated to docker. It works as expected, however, for new data that is inserted, it can only be retrieved by id. Once we try to use filters, it doesn't show. Note that old data can still be filtered without any issues.
The database is is used via spring-boot crud-like application.
More background:
The app and the solr were migrated by another person and I have inherited the codebase recently so I am not familiar in much detail about the implementation and am still digging and debugging.
The nodes were migrated as-is (the data was copied into a docker mount).
What I have so far:
I have checked the logs of all the solr nodes and see the following happening when making the calls to the application:
Filtering:
2019-02-22 14:17:07.525 INFO (qtp15xxxxx-15) [c:content_api s:shard1 r:core_node1 x:content_api_shard1_replica0] o.a.s.c.S.Request
[content_api_shard1_replica0]
webapp=/solr path=/select
params=
{q=*:*&start=0&fq=id-lws-ttf:127103&fq=active-boo-ttf:(true)&fq=(publish-date-tda-ttf:[*+TO+2019-02-22T15:17:07Z]+OR+(*:*+NOT+publish-date-tda-ttf:[*+TO+*]))AND+(expiration-date-tda-ttf:[2019-02-22T15:17:07Z+TO+*]+OR+(*:*+NOT+expiration-date-tda-ttf:[*+TO+*]))&sort=create-date-tda-ttf+desc&rows=10&wt=javabin&version=2}
hits=0 status=0 QTime=37
Get by ID:
2019-02-22 14:16:56.441 INFO (qtp15xxxxxx-16) [c:content_api s:shard1 r:core_node1 x:content_api_shard1_replica0] o.a.s.c.S.Request
[content_api_shard1_replica0]
webapp=/solr path=/get params={ids=https://example.com/app/contents/127103/middle-east&wt=javabin&version=2}
status=0 QTime=0
Disclaimer:
I am an absolute beginner in working with Solr and am going through documentation ATM in order to get better insight into the nuts and bolts.
Assumptions and WIP:
The person who migrated it told me that only the data was copied, not the configuration. I have acquired the old config files (/opt/solr/server/solr/configsets/) and am trying to compare to the new ones. But the assumption is that the configs were defaults.
The old version was 6.4.2 and the new one is 6.6.5 (not sure that this could be the issue)
Is there something obvious that we are missing here? What is superconfusing is the fact that the data can be retrieved by id AND that the OLD data can be filtered
Update:
After some researching, I have to say that I have excluded the config issue because when I inspect the configuration from the admin UI, I see the correct configuration.
Also, another weird behavior is that the data can be queried after some time (like more than 5 days). I can see that because I run the query from the UI and order it by descending creation date. From there, I can see my tests that I was not just days ago
Relevant commit config part:
<autoCommit>
<maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
</autoSoftCommit>
More config output from the admin endpoint:
config:{
znodeVersion:0,
luceneMatchVersion:"org.apache.lucene.util.Version:6.0.1",
updateHandler:{
indexWriter:{
closeWaitsForMerges:true
},
commitWithin:{
softCommit:true
},
autoCommit:{
maxDocs:-1,
maxTime:15000,
openSearcher:false
},
autoSoftCommit:{
maxDocs:-1,
maxTime:-1
}
},
query:{
useFilterForSortedQuery:false,
queryResultWindowSize:20,
queryResultMaxDocsCached:200,
enableLazyFieldLoading:true,
maxBooleanClauses:1024,
filterCache:{
autowarmCount:"0",
size:"512",
initialSize:"512",
class:"solr.FastLRUCache",
name:"filterCache"
},
queryResultCache:{
autowarmCount:"0",
size:"512",
initialSize:"512",
class:"solr.LRUCache",
name:"queryResultCache"
},
documentCache:{
autowarmCount:"0",
size:"512",
initialSize:"512",
class:"solr.LRUCache",
name:"documentCache"
},
:{
size:"10000",
showItems:"-1",
initialSize:"10",
name:"fieldValueCache"
}
},
...
According to your examples you're only retrieving the document when you're querying the realtime get endpoint - i.e. /get. This endpoint returns documents by querying by id, even if the document hasn't been commited to the index or a new searcher has been opened.
A new searcher has to be created before any changes to the index become visible to the regular search endpoints, since the old searcher will still use the old index files for searching. If a new searcher isn't created, the stale content will still be returned. This matches the behaviour you're seeing, where you're not opening any new searchers, and content becomes visible when the searcher is recycled for other reasons (possibly because of restarts/another explicit commit/merges/optimizes/etc.).
Your example configuration shows that the autoSoftCommit is disabled, while the regular autoCommit is set to not open a new searcher (and thus, no new content is shown). I usually recommend disabling this feature and instead relying on using commitWithin in the URL as it allows greater configurability for different types of data, and allows you to ask for a new searcher to be opened within at least x seconds since data has been added. The default behaviour for commitWithin is that a new searcher will be opened after the commit has happened.
Sounds like you might have switched to a default managed schema on upgrade. Look for schema.xml in your previous install along with a section in your prior install's solrconfig.xml. More info at https://lucene.apache.org/solr/guide/6_6/schema-factory-definition-in-solrconfig.html#SchemaFactoryDefinitioninSolrConfig-SolrUsesManagedSchemabyDefault

Index not Found Exception

So, back again
I have a JHipster generated project which uses an elasticsearch java client embedded in spring boot.
I have recently done some major changes to the datasets since we've been migrating a whole new bunch of data from different repositories
When deploying the application it all works fine, all SearchRepositories are loaded with no problem and all search capabilities roll smooth
The issues come when running from the test environment. There have been no changes what so ever to the application-test.yml file nor to the elasticsearch java config file.
We have some code which updates the indices and I've run it several times, it seems to update the clusters indices just fine, but where I'm suffering is in the target folder, it just won't create the new indices
There are 12 indices that I cannot get in to the target folder when running in test mode, however, only 5 of them fail in their ResourceIntTest because of the error mentioned in the title
I don't want to fill this post with hundreds of irrelevant lines of code, so suffice for now to include the workaround that helps test not to fail:
When in the initTest of the 5 failing test cases, if I write the following line (obviously changing the class name in each different case):
surveyDataQualitySearchRepository.save(surveyDataQualityRepository.findAll());
Then the index will create itself and the testcase will not fail, however this shouldn't be necessary to do manually, it should be created when the resetIndex method in the IndexReinitializer class is called upon deployment
resetIndex:
#PostConstruct
public void resetIndex() {
long t = currentTimeMillis();
elasticsearchTemplate.deleteIndex("_all");
t = currentTimeMillis() - t;
logger.debug("ElasticSearch indexes reset in {} ms", t);
}
Commenting this piece of code also allows all indices to be loaded, but it should not be commented as this serves as an updater for the indices, plus it works fine in an old version of the application which is still pointing to the old dataset
All help will be very welcome, I've been on this almost a full day now trying to understand where the error is coming from, I'm also more than happy to upload any pieces of code that may be relevant to anyone willing to help here.
EDIT To add code for the indices rebuild as requested via comments
#Test
public void synchronizeData() throws Exception{
resetIndex();
activePharmaIngredientSearchRepository.save(activePharmaIngredientRepository.findAll());
countrySearchRepository.save(countryRepository.findAll());
dosageUnitSearchRepository.save(dosageUnitRepository.findAll());
drugCategorySearchRepository.save(drugCategoryRepository.findAll());
drugQualityCategorySearchRepository.save(drugQualityCategoryRepository.findAll());
formulationSearchRepository.save(formulationRepository.findAll());
innDrugSearchRepository.save(innDrugRepository.findAll());
locationSearchRepository.save(locationRepository.findAll());
manufacturerSearchRepository.save(manufacturerRepository.findAll());
outletTypeSearchRepository.save(outletTypeRepository.findAll());
publicationSearchRepository.save(publicationRepository.findAll());
publicationTypeSearchRepository.save(publicationTypeRepository.findAll());
qualityReferenceSearchRepository.save(qualityReferenceRepository.findAll());
reportQualityAssessmentAssaySearchRepository.save(reportQualityAssessmentAssayRepository.findAll());
//rqaaQualitySearchRepository.save(rqaaQualityRepository.findAll());
rqaaTechniqueSearchRepository.save(rqaaTechniqueRepository.findAll());
samplingTypeSearchRepository.save(samplingTypeRepository.findAll());
//surveyDataQualitySearchRepository.save(surveyDataQualityRepository.findAll());
surveyDataSearchRepository.save(surveyDataRepository.findAll());
techniqueSearchRepository.save(techniqueRepository.findAll());
tradeDrugApiSearchRepository.save(tradeDrugApiRepository.findAll());
tradeDrugSearchRepository.save(tradeDrugRepository.findAll());
publicationDrugTypesSearchRepository.save(publicationDrugTypesRepository.findAll());
wrongApiSearchRepository.save(wrongApiRepository.findAll());
}
private void resetIndex() {
long t = currentTimeMillis();
elasticsearchTemplate.deleteIndex("_all");
t = currentTimeMillis() - t;
logger.debug("ElasticSearch indexes reset in {} ms", t);
}
Please try to update to the latest version of spring-data-elasticsearch

Hbase CopyTable inside Java

I want to copy one Hbase table to another location with good performance.
I would like to reuse the code from CopyTable.java from Hbase-server github page
I've been looking the doccumentation from hbase but it didn't help me much http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CopyTable.html
After looking in this post of stackoverflow: Can a main() method of class be invoked in another class in java
I think I can directly call it using its main class.
Question: Do you think anyway better to get this copy done rather than using CopyTable from hbase-server ? Do you see any inconvenience using this CopyTable ?
Question: Do you think anyway better to get this copy done rather than
using CopyTable from hbase-server ? Do you see any inconvenience using
this CopyTable ?
First thing is snapshot is better way than CopyTable.
HBase Snapshots allow you to take a snapshot of a table without too much impact on Region Servers. Snapshot, Clone and restore operations don't involve data copying. Also, Exporting the snapshot to another cluster doesn't have impact on the Region Servers.
Prior to version 0.94.6, the only way to backup or to clone a table is to use CopyTable/ExportTable, or to copy all the hfiles in HDFS after disabling the table. The disadvantages of these methods are that you can degrade region server performance (Copy/Export Table) or you need to disable the table, that means no reads or writes; and this is usually unacceptable.
Snapshot is not just rename, between multiple operations if you want to restore at one particular point then this is the right case to use :
A snapshot is a set of metadata information that allows an admin to get back to a previous state of the table. A snapshot is not a copy of the table; it’s just a list of file names and doesn’t copy the data. A full snapshot restore means that you get back to the previous “table schema” and you get back your previous data losing any changes made since the snapshot was taken.
Also, see Snapshots+and+Repeatable+reads+for+HBase+Tables
Snapshot Internals
Another Map reduce way than CopyTable :
You can implement something like below in your code this is for standalone program where as you have write mapreduce job to insert multiple put records as a batch (may be 100000).
This increased performance for standalone inserts in to hbase client you can try this in mapreduce way
public void addMultipleRecordsAtaShot(final ArrayList<Put> puts, final String tableName) throws Exception {
try {
final HTable table = new HTable(HBaseConnection.getHBaseConfiguration(), getTable(tableName));
table.put(puts);
LOG.info("INSERT record[s] " + puts.size() + " to table " + tableName + " OK.");
} catch (final Throwable e) {
e.printStackTrace();
} finally {
LOG.info("Processed ---> " + puts.size());
if (puts != null) {
puts.clear();
}
}
}
along with that you can also consider below...
Enable write buffer to large value than default
1) table.setAutoFlush(false)
2) Set buffer size
<property>
<name>hbase.client.write.buffer</name>
<value>20971520</value> // you can double this for better performance 2 x 20971520 = 41943040
</property>
OR
void setWriteBufferSize(long writeBufferSize) throws IOException
The buffer is only ever flushed on two occasions:
Explicit flush
Use the flushCommits() call to send the data to the servers for permanent storage.
Implicit flush
This is triggered when you call put() or setWriteBufferSize().
Both calls compare the currently used buffer size with the configured limit and optionally invoke the flushCommits() method.
In case the entire buffer is disabled, setting setAutoFlush(true) will force the client to call the flush method for every invocation of put().

Getting CPU 100 percent when I am trying to downloading CSV in Spring

I am getting CPU performance issue on server when I am trying to download CSV in my project, CPU goes 100% but SQL returns the response within 1 minute. In the CSV we are writing around 600K records for one user it is working fine but for concurrent users we are getting this issue.
Environment
Spring 4.2.5
Tomcat 7/8 (RAM 2GB Allocated)
MySQL 5.0.5
Java 1.7
Here is the Spring Controller code:-
#RequestMapping(value="csvData")
public void getCSVData(HttpServletRequest request,
HttpServletResponse response,
#RequestParam(value="param1", required=false) String param1,
#RequestParam(value="param2", required=false) String param2,
#RequestParam(value="param3", required=false) String param3) throws IOException{
List<Log> logs = service.getCSVData(param1,param2,param3);
response.setHeader("Content-type","application/csv");
response.setHeader("Content-disposition","inline; filename=logData.csv");
PrintWriter out = response.getWriter();
out.println("Field1,Field2,Field3,.......,Field16");
for(Log row: logs){
out.println(row.getField1()+","+row.getField2()+","+row.getField3()+"......"+row.getField16());
}
out.flush();
out.close();
}}
Persistance Code:- I am using spring JDBCTemplate
#Override
public List<Log> getCSVLog(String param1,String param2,String param3) {
String sql =SqlConstants.CSV_ACTIVITY.toString();
List<Log> csvLog = JdbcTemplate.query(sql, new Object[]{param1, param2, param3},
new RowMapper<Log>() {
#Override
public Log mapRow(ResultSet rs, int rowNum)
throws SQLException {
Log log = new Log();
log.getField1(rs.getInt("field1"));
log.getField2(rs.getString("field2"));
log.getField3(rs.getString("field3"));
.
.
.
log.getField16(rs.getString("field16"));
}
return log;
}
});
return csvLog;
}
I think you need to be specific on what you meant by "100% CPU usage" whether it's the Java process or MySQL server. As you have got 600K records, trying to load everything in to memory would easily end up in OutOfMemoryError. Given that this works for one user means that you've got enough heap space to process this number of records for just one user and symptoms surface when there are multiple users trying to use the same service.
First issue I can see in your posted code is that you try to load everything into one big list and the size of the list varies based on the content of the Log class. Using a list like this also means that you have to have enough memory to process JDBC result set and generate new list of Log instances. This can be a major problem with a growing number of users. This type of short-lived objects will cause frequent GC and once GC cannot keep up with the amount of garbage being collected it fails obviously. To solve this major issue my suggestion is to use ScrollableResultSet. Additionally you can make this result set read-only, for example below is code fragment for creating a scrollable result set. Take a look at the documentation for how to use it.
Statement st = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_READ_ONLY);
Above option is suitable if you're using pure JDBC or SpringJDBC template. If Hibernate is already used in your project you can still achieve the same this with the below code fragment. Again please check the documentation for more information and you have a different JPA provider.
StatelessSession session = sessionFactory.openStatelessSession();
Query query = session.createSQLQuery(queryStr).setCacheable(false).setFetchSize(Integer.MIN_VALUE).setReadOnly(true);
query.setParameter(query_param_key, query_paramter_value);
ScrollableResults resultSet = query.scroll(ScrollMode.FORWARD_ONLY);
This way you're not loading all the records to Java process in one go, instead you they're loaded on demand and will have small memory footprint at any given time. Note that JDBC connection will be open until you're done with processing the entire record set. This also means that your DB connection pool can be exhausted if many users are going to download CSV files from this endpoint. You need to take measures to overcome this problem (i.e use of an API manager to rate limit the calls to this endpoint, reading from a read-replica or whatever viable option).
My other suggestion is to stream data which you have already done, so that any records fetched from the DB are processed and sent to client before the next set of records are processed. Again I would suggest you to use a CSV library such as SuperCSV to handle this as these libraries are designed to handle a good load of data.
Please note that this answer may not exactly answer your question as you haven't provided necessary parts of your source such as how to retrieve data from DB but will give the right direction to solve this issue
Your problem in loading all data on application server from database at once, try to run query with limit and offset parameters (with mandatory order by), push loaded records to client and load next part of data with different offset. It help you decrease memory footprint and will not required keep connection to database open all the time. Of course, database will loaded a bit more, but maybe whole situation will better. Try different limit values, for example 5K-50K and monitor cpu usage - on both app server and database.
If you can allow keep many open connection to database #Bunti answer is very good.
http://dev.mysql.com/doc/refman/5.7/en/select.html

Jena TDB hangs/freezes on named model access

I have a problem with Apache Jena TDB. Basically I create a new Dataset, load data from an RDF/XML file into a named model with the name "http://example.com/model/filename" where filename is the name of the XML/RDF file. After loading the data, all statements from the named model are inserted into the default model. The named model is kept in the dataset for backup reasons.
When I now try to query the named models in the Dataset, TDB freezes and the application seems to run in an infinite loop, so it is not terminated nor does it throw an exception.
What is causing that freeze and how can I prevent it?
Example code:
Dataset ds = TDBFactory.createDataset("tdb");
Model mod = ds.getDefaultModel();
File f = new File("example.rdf");
FileInputStream fis = new FileInputStream(f);
ds.begin(ReadWrite.WRITE);
// Get a new named model to load the data into
Model nm = ds.getNamedModel("http://example.com/model/example.rdf");
nm.read(fis, null);
// Do some queries on the Model using the utility methods of Model, no SPARQL used
// Add all statements from the named model to the default model
mod.add(nm);
ds.commit();
ds.end();
// So far everything works as expected, but the following line causes the freeze
Iterator<String> it = ds.listNames();
Any method call that accesses the existing named models causes the same freeze reaction, so this is the same for getNamedModel("http://example.com/model/example.rdf"); for example. Adding new named models by calling getNamedModel("http://example.com/model/example123.rdf"); works fine, so only access to existing models is broken.
Used environment: Linux 64bit, Oracle Java 1.7.0_09, Jena 2.7.4 (incl. TDB 0.9.4)
Thanks in advance for any help!
Edit: Fixed mistake in code fragment
Edit2: Solution (my last comment to AndyS answer)
Ok, I went through the whole program and added all missing transactions. Not it is working as expected. I suspect Jena throwing an Exception during the shutdown sequence of my program but that Exception was not reported properly and the "freeze" was caused by other threads not terminating correctly. Thanks for pointing the faulty transaction usage out.
Could you turn this into a test case and send it to the jena users mailing list please?
You should get the default model inside the transaction - you got it outside.
Also, if you have used a dataset transactionally, you can't use it untransactionally as you do at ds.listNames. It shouldn't freeze - you should get some kind of warning.

Categories

Resources