I have a Watcher that updates my data structures when a change is heard. However, if the change is not instantaneous (i.e. if a large file is being copied from another file system, or a big part of the file is modified), the data-structure tries to update too early and throws an error.
How can I modify my code so that updateData() is called after only the last ENTRY_MODIFY is called, rather than after every single ENTRY_MODIFY.
private static boolean processWatcherEvents () {
WatchKey key;
try {
key = watcher.poll( 10, TimeUnit.MILLISECONDS );
} catch ( InterruptedException e ) {
return false;
}
Path directory = keys.get( key );
if ( directory == null ) {
return false;
}
for ( WatchEvent <?> event : key.pollEvents() ) {
WatchEvent.Kind eventKind = event.kind();
WatchEvent <Path> watchEvent = (WatchEvent<Path>)event;
Path child = directory.resolve( watchEvent.context() );
if ( eventKind == StandardWatchEventKinds.ENTRY_MODIFY ) {
//TODO: Wait until modifications are "finished" before taking these actions.
if ( Files.isDirectory( child ) ) {
updateData( child );
}
}
boolean valid = key.reset();
if ( !valid ) {
keys.remove( key );
}
}
return true;
}
As #TT suggested, you can do it pretty easily with file locks.
When you get an event, use a blocking method lock() on read and write access. Hence the operation is blocking, the code automatically waits until the write operation is finished.
FileChannel channel = new RandomAccessFile(file, "rw").getChannel();
try (channel) { // auto closable, uses channel.close() in finally block
channel.lock(); // wait until file modifications are finished
channel.read(...); // now you can safely read the file
}
However, this won't work between different JVM processes, because they don't share the same lock.
Is your problem can be solved by using timestamp.
Create a map for storing the timestamp to the map.
Map<Path, Long> fileTimeStamps;
For process event check last modified timestamp.
long oldFileModifiedTimeStamp = fileTimeStamps.get(filePath);
long newFileModifiedTimeStamp = filePath.toFile().lastModified();
if (newFileModifiedTimeStamp > oldFileModifiedTimeStamp)
{
fileTimeStamps.remove(filePath);
onEventOccurred();
fileTimeStamps.put(filePath, filePath.toFile().lastModified());
}
I ended up writing a thread that keeps a list of things I want updated and delays actually updating them until 80 milliseconds have passed. Whenever an ENTRY_MODIFY event happens, it resets the counter. I think this is a good solution, but there may be better?
#SuppressWarnings({ "rawtypes", "unchecked" })
private static boolean processWatcherEvents () {
WatchKey key;
try {
key = watcher.poll( 10, TimeUnit.MILLISECONDS );
} catch ( InterruptedException e ) {
return false;
}
Path directory = keys.get( key );
if ( directory == null ) {
return false;
}
for ( WatchEvent <?> event : key.pollEvents() ) {
WatchEvent.Kind eventKind = event.kind();
WatchEvent <Path> watchEvent = (WatchEvent<Path>)event;
Path child = directory.resolve( watchEvent.context() );
if ( eventKind == StandardWatchEventKinds.ENTRY_CREATE ) {
if ( Files.isDirectory( child ) ) {
loadMe.add( child );
} else {
loadMe.add( child.getParent() );
}
} else if ( eventKind == StandardWatchEventKinds.ENTRY_DELETE ) {
//Handled by removeMissingFiles(), can ignore.
} else if ( eventKind == StandardWatchEventKinds.ENTRY_MODIFY ) {
System.out.println( "Modified: " + child.toString() ); //TODO: DD
if ( Files.isDirectory( child ) ) {
modifiedFileDelayedUpdater.addUpdateItem( child );
} else {
modifiedFileDelayedUpdater.addUpdateItem( child );
}
} else if ( eventKind == StandardWatchEventKinds.OVERFLOW ) {
for ( Path path : musicSourcePaths ) {
updateMe.add( path );
}
}
boolean valid = key.reset();
if ( !valid ) {
keys.remove( key );
}
}
return true;
}
...
class UpdaterThread extends Thread {
public static final int DELAY_LENGTH_MS = 80;
public int counter = DELAY_LENGTH_MS;
Vector <Path> updateItems = new Vector <Path> ();
public void run() {
while ( true ) {
long sleepTime = 0;
try {
long startSleepTime = System.currentTimeMillis();
Thread.sleep ( 20 );
sleepTime = System.currentTimeMillis() - startSleepTime;
} catch ( InterruptedException e ) {} //TODO: Is this OK to do? Feels like a bad idea.
if ( counter > 0 ) {
counter -= sleepTime;
} else if ( updateItems.size() > 0 ) {
Vector <Path> copyUpdateItems = new Vector<Path> ( updateItems );
for ( Path path : copyUpdateItems ) {
Library.requestUpdate ( path );
updateItems.remove( path );
}
}
}
}
public void addUpdateItem ( Path path ) {
counter = DELAY_LENGTH_MS;
if ( !updateItems.contains( path ) ) {
updateItems.add ( path );
}
}
};
Related
I've been trying for a few days how to detect if the account of a player is authenticated Mojang while he is in OfflineMode.
Why do I want to do that?
Currently, I have a basic management system that consists of checking if the player's nickname exists in the mojang database, if there is setOnlineMode set to true otherwise it is set to false.
The system allows to display the skin of the player and his UUID but the problem is that if the player considered offline purchases a Premium account with the same pseudonym, he does not have his skin or his real UUID because the setOnlineMode is set to false to prevent the loss of its progress.
My goal is to make a system that detects that an offline user has just logged in with an authenticated minecraft account so that the server can offer him an automatic transfer of his progress to his new authentic UUID.
I did some research is try as for example here, i removed the onlinemode condition to allow checking if the player was authenticated then here I deleted the disconnect if the player was not valid. Which gave me a wonderful mistake.
13:13:31 [GRAVE] [Arbi13_] -> UpstreamBridge - encountered exception io.netty.handler.codec.EncoderException: java.lang.IllegalArgumentException: Cannot get ID for packet class net.md_5.bungee.protocol.packet.SetCompression in phase GAME with direction TO_CLIENT
at io.netty.handler.codec.MessageToByteEncoder.write(MessageToByteEncoder.java:125)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:801)
at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:814)
at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794)
at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1066)
at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:305)
at net.md_5.bungee.netty.ChannelWrapper.write(ChannelWrapper.java:60)
at net.md_5.bungee.UserConnection$1.sendPacket(UserConnection.java:148)
at net.md_5.bungee.UserConnection.setCompressionThreshold(UserConnection.java:697)
at net.md_5.bungee.connection.InitialHandler$6$1.run(InitialHandler.java:523)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalArgumentException: Cannot get ID for packet class net.md_5.bungee.protocol.packet.SetCompression in phase GAME with direction TO_CLIENT
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:399)
at net.md_5.bungee.protocol.Protocol$DirectionData.getId(Protocol.java:462)
at net.md_5.bungee.protocol.MinecraftEncoder.encode(MinecraftEncoder.java:23)
at net.md_5.bungee.protocol.MinecraftEncoder.encode(MinecraftEncoder.java:9)
at io.netty.handler.codec.MessageToByteEncoder.write(MessageToByteEncoder.java:107)
... 15 more
#Override
public void handle(LoginRequest loginRequest) throws Exception
{
Preconditions.checkState( thisState == State.USERNAME, "Not expecting USERNAME" );
this.loginRequest = loginRequest;
if ( getName().contains( "." ) )
{
disconnect( bungee.getTranslation( "name_invalid" ) );
return;
}
if ( getName().length() > 16 )
{
disconnect( bungee.getTranslation( "name_too_long" ) );
return;
}
int limit = BungeeCord.getInstance().config.getPlayerLimit();
if ( limit > 0 && bungee.getOnlineCount() > limit )
{
disconnect( bungee.getTranslation( "proxy_full" ) );
return;
}
// If offline mode and they are already on, don't allow connect
// We can just check by UUID here as names are based on UUID
if ( !isOnlineMode() && bungee.getPlayer( getUniqueId() ) != null )
{
disconnect( bungee.getTranslation( "already_connected_proxy" ) );
return;
}
Callback<PreLoginEvent> callback = new Callback<PreLoginEvent>()
{
#Override
public void done(PreLoginEvent result, Throwable error)
{
if ( result.isCancelled() )
{
disconnect( result.getCancelReasonComponents() );
return;
}
if ( ch.isClosed() )
{
return;
}
unsafe().sendPacket( request = EncryptionUtil.encryptRequest() );
thisState = State.ENCRYPT;
}
};
// fire pre login event
bungee.getPluginManager().callEvent( new PreLoginEvent( InitialHandler.this, callback ) );
}
#Override
public void handle(final EncryptionResponse encryptResponse) throws Exception
{
Preconditions.checkState( thisState == State.ENCRYPT, "Not expecting ENCRYPT" );
SecretKey sharedKey = EncryptionUtil.getSecret( encryptResponse, request );
BungeeCipher decrypt = EncryptionUtil.getCipher( false, sharedKey );
ch.addBefore( PipelineUtils.FRAME_DECODER, PipelineUtils.DECRYPT_HANDLER, new CipherDecoder( decrypt ) );
BungeeCipher encrypt = EncryptionUtil.getCipher( true, sharedKey );
ch.addBefore( PipelineUtils.FRAME_PREPENDER, PipelineUtils.ENCRYPT_HANDLER, new CipherEncoder( encrypt ) );
String encName = URLEncoder.encode( InitialHandler.this.getName(), "UTF-8" );
MessageDigest sha = MessageDigest.getInstance( "SHA-1" );
for ( byte[] bit : new byte[][]
{
request.getServerId().getBytes( "ISO_8859_1" ), sharedKey.getEncoded(), EncryptionUtil.keys.getPublic().getEncoded()
} )
{
sha.update( bit );
}
String encodedHash = URLEncoder.encode( new BigInteger( sha.digest() ).toString( 16 ), "UTF-8" );
String preventProxy = ( ( BungeeCord.getInstance().config.isPreventProxyConnections() ) ? "&ip=" + URLEncoder.encode( getAddress().getAddress().getHostAddress(), "UTF-8" ) : "" );
String authURL = "https://sessionserver.mojang.com/session/minecraft/hasJoined?username=" + encName + "&serverId=" + encodedHash + preventProxy;
Callback<String> handler = new Callback<String>()
{
#Override
public void done(String result, Throwable error)
{
if ( error == null )
{
LoginResult obj = BungeeCord.getInstance().gson.fromJson( result, LoginResult.class );
if ( obj != null && obj.getId() != null )
{
loginProfile = obj;
name = obj.getName();
uniqueId = Util.getUUID( obj.getId() );
authenticated = true;
finish();
return;
}
if(isOnlineMode()) {
disconnect(bungee.getTranslation("offline_mode_player"));
return;
}
finish();
return;
} else
{
disconnect( bungee.getTranslation( "mojang_fail" ) );
bungee.getLogger().log( Level.SEVERE, "Error authenticating " + getName() + " with minecraft.net", error );
}
}
};
HttpClient.get( authURL, ch.getHandle().eventLoop(), handler );
}
I dont think this is possible due to Minecraft not exchanging session details. I know servers who are running a second proxy with online mode enabled for the premium users to handle the session details leading to the same bukkit servers "behind".
You might be looking for this: https://www.spigotmc.org/resources/fastlogin.14153/
I have not used it yet, but according to the reviews, it still works in the latest version.
Also, it's open source, you might be able to peek into the code and see how it's done.
I'm developing a Javafx application, that synchronizes some data from two different databases.
In the call method I get all the data and store it in an ArrayList. Then I loop through the ArrayList and I try to get that same data from the second database.
If it exists I compare it for differences and if there are differences I update it. Otherwise, if it dosen't exist, I insert it via a DAO object method.
The problem is that sometimes the second database takes some time to provide the response so the process continues its execution and the new data will be compared with old data.
My question is, how can I stop the process until the data has all been fetched and then proceed to the synchronization logic?
#Override
protected Map call() throws Exception {
Map<String, Integer> m = new HashMap();
updateTitle( "getting the data ..." );
int i, updated = 0, inserted = 0;
// creating first database instance
DAOFactory db1Dao = DAOFactory.getInstance( "db1" );
//creating the first database dataObject instance
Db1EmployerDAO empDb1Dao = db1Dao.getDAODb1Employer();
// creating second database instance
DAOFactory db2Dao = DAOFactory.getInstance( "db2" );
//creating the second database dataObject instance
Db2EmployeurDAO empDb2Dao = db2Dao.getDAODb2Employer();
Employer emp;
// getting all the object
List< Employer > LEmpDb1 = empDb1Dao.getAll();
updateTitle( "Data proccessing ..." );
//for each data in the list
for( i = 1; i <= LEmpDb1.size(); i++ ){
if( isCancelled() )
break;
updateMessage( "Processing employer : "+ LEmpDb1.get( i-1 ).getNemploy() +" "+ LEmpDb1.get( i-1 ).getRaison() );
//trying to get the object from the second database which the
//process sometimes pass befor the result is getting which is my problem
emp = empDb2Dao.getEmployerByNo( LEmpDb1.get( i-1 ).getNemploy() );
if( emp != null){
if( !LEmpDb1.get( i-1 ).equals( emp ) )
if( empDb2Dao.update( LEmpDb1.get( i-1 ) ) ){
updated++;
LOG.log( "MAJ employeur : "+ LEmpDb1.get( i ).getNemploy()+" => "+LEmpDb1.get( i ).getDifferences( emp ) );
}
} else {
if( empDb2Dao.insert( LEmpDb1.get( i-1 ) ) )
inserted++;
}
updateProgress( i, LEmpDb1.size() );
}
m.put( "upd", updated );
m.put( "ins", inserted );
m.put( "all", LEmpDb1.size() );
return m;
}
The getEmployerByNo method
public synchronized Employer getEmployerByNo( String no_emp ) throws DAOException {
Employeur emp = null;
Connection con = null;
PreparedStatement stm = null;
ResultSet res = null;
try{
con = dao.getConnection();
stm = preparedRequestInitialisation( con, GET_BY_NO_SQL, no_emp );
res = stm.executeQuery();
if( res.next() ){
//map is a function that map the database resultset data with the object properties
emp = map( res );
LOG.info( "getting the employer : "+ no_emp );
}
} catch( SQLException e ){
throw new DAOException( e.getLocalizedMessage() );
} finally{
silentClose( res, stm, con );
}
return emp;
}
Look into using an ExecutorService and Future.get() as needed to wait for completion. See the documentation here and here. Here is a more-or-less complete example:
public class Application implements Runnable {
private final ExecutorService pool = Executors.newCachedThreadPool();
public void run() {
Dao firstDao = new DaoImpl();
Dao secondDao = new AnotherDaoImpl();
FetchAllTask fetchAll = new FetchAllTask(firstDao);
Future<?> fetchAllFuture = pool.submit(fetchAll);
try {
fetchAllFuture.get();
} catch (Exception e) {
// TODO handle
System.out.println("An exception occurred!");
e.printStackTrace();
}
ConcurrentSkipListSet<AnObject> items = fetchAll.getItems();
Iterator<AnObject> it = items.iterator();
while (it.hasNext()) {
// insert your cancellation logic here
// ...
AnObject daoObj = it.next();
FetchOneTask fetchOne = new FetchOneTask(secondDao, daoObj.getId());
Future<?> fetchOneFuture = pool.submit(fetchOne);
try {
fetchOneFuture.get();
AnObject anotherDaoObj = fetchOne.getAnObject();
if (anotherDaoObj == null) {
// the object retrievied by the first dao (first datasource)
// is not in the second; it needs to be inserted into the second
System.out.println(String.format("Inserting %s", daoObj));
secondDao.insert(daoObj);
} else {
System.out.println(String.format("Updating %s to %s", anotherDaoObj, daoObj));
secondDao.update(daoObj);
}
} catch (Exception e) {
System.out.println("An exception occurred!");
e.printStackTrace();
}
}
Set<AnObject> itemsInSecondDb = secondDao.fetchAll();
for (AnObject o : itemsInSecondDb) {
System.out.println(o);
}
pool.shutdown();
}
// ... invoke the app thread from somewhere else
}
I have a delay happening where my pushGreentText does not show up in my EditText until after the last brace of firstrollSP leaves scope. pushGreenText is a custom function on the UI thread I wrote that adds a text to my EditText in XML that overlays a SurfaceView. I would like to see the green text show up right away instead of waiting for the entire moveWholeAITurn sequence. Is there a way to accomplish that? I tried threading firstrollSP but there is still a delay and I thought about maybe threading pushGreenText but I'm not sure that is the answer yet.
Thank you...
static synchronized public void firstRollSP( ) throws InterruptedException
{
//Roll for Player and announce
mGame.mDice.setDie1( mGame.mDiceFirstRoll.getDie1( ) );
mGame.mDice.setRolled( true );
MainActivity.activity.pushTextGreen( Strings.get_First_roll_X_Die1( ) );
//Roll for two and announce
Thread one = new Thread( )
{
public void run( )
{
try
{
Thread.sleep( 2000 );
mGame.mDice.setDie2( mGame.mDiceFirstRoll.getDie2( ) );
while( mGame.mDice.getDie1( ) == mGame.mDice.getDie2( ) )
mGame.mDice.setDie2( (mRng.nextInt( 6 ) + 1) );
}
catch( InterruptedException e )
{
Log.d( "ACtionUP", "Interupted e" );
}
}
};
one.start( );
one.join( );
MainActivity.activity.pushTextGreen( "Android first roll is " + Integer.toString( mGame.mDice.getDie2() ) );
if( H.initWonFirstRoll( ) )
{
MainActivity.activity.pushTextGreen("Player won first roll.");
Thread tInitWon = new Thread( )
{
public void run( )
{
try
{
Thread.sleep( 2000 );
mGame.isFirstRoll = false;
mGame.isTurn = true;
mGameAI.isFirstRoll = false;
mGameAI.isTurn = false;
mGame.mDice.sort( );
mGame.mDice.setRolled( true );
mGame.mDice.setDiceAnimationComplete( true );
mGame.mOppDice.init( );
mGame.mPossibleIndexes.calcPossibleTriangles( );
}
catch( Exception e )
{
Log.d( "ACtionUP", "Interupted e" );
}
}
};
tInitWon.start( );
tInitWon.join( );
}
else
{
MainActivity.activity.pushTextGreen("Android won first roll.");
Thread tDroidWon = new Thread( )
{
public void run( )
{
try
{
Thread.sleep( 2000 );
mGame.isFirstRoll = false;
mGame.isTurn = false;
mGameAI.isFirstRoll = false;
mGameAI.isTurn = true;
mGameAI.mDice.init( );
mGame.mOppDice.init( );
mGame.mDice.sort( );
mGameAI.mDice.setDie1( mGame.mDice.getDie1() );
mGameAI.mDice.setDie2( mGame.mDice.getDie2() );
mGame.mOppDice.setDie1( mGame.mDice.getDie1() );
mGame.mOppDice.setDie2( mGame.mDice.getDie2() );
Thread.sleep( 2000 );
mGameAI.mPossibleIndexes.calcPossibleTrianglesAI( );
}
catch( InterruptedException e )
{
Log.d( "ACtionUP", "Interupted e" );
}
}
};
tDroidWon.start( );
tDroidWon.join( );
if( mGameAI.mPossibleIndexes.anyPossibles( ) )
{
moveWholeTurnAI( );
}
else
{
H.endTurnAI( );
}
}
}
Because the UI thread is an event loop. It will not update the screen until it returns to the main looper and handles the draw event- until whatever function of yours it calls exits.
You should never call thread.join on the main thread. Your app will freeze if you do, it may crash if it trips a watchdog timer. You need to refactor it to not require joins on the UI thread.
In addition, I see a lot of MainActivity.activity. That's a huge code smell- if you're holding the activity in a static variable you will have memory leaks and you are almost certain to have problems if your app is relaunched. Its something you should never do.
After reconstructing an old piece of code I once wrote, then forgot, now rewritten... I am putting it here as a wiki for all to use :-)
So, basically: If you got memory leaks in a complex Android app, containing images and cross-references. How would you go and find which (type of) objects are leaking? There are a few (very hard to learn and use) tools provided with the Android SDK. Probably there are more which I don't know. Yet, Java do provide PhantomReference as a mean to do this, even though going through the mess required to set up the required classes can be much work (And nasty too... JDK-8034946).
But what is the most simple/effective way of doing so? My solution below.
LeakCanary is a 3rd party library which automatically detects memory leaks, after adding the dependency you can add the following line to your application class:
LeakCanary.install(this);
The library provides a nice notification & trace of the leak, you can also define your own reference watchers (although the default ones seem to work fairly well).
My solution in one class: "MemCheck"
To monitor any object, just call:
MemCheck.add( this ); // in any class constructor
This will "monitor" the number of objects allocated, and most importantly - Deallocated.
To log the leaks at any required time, call:
MemCheck.countAndLog();
As an alternative, set MemCheck.periodic = 5 (number of seconds).
This will report the number of monitored objects in memory every 5 seconds. It will also conveniently log the used/free memory.
So, MemCheck.java:
package com.xyz.util;
import java.lang.ref.PhantomReference;
import java.lang.ref.ReferenceQueue;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Map.Entry;
import java.util.TreeMap;
import android.os.Handler;
import android.util.Log;
public class MemCheck
{
private static final boolean enabled = true;
private static final int periodic = 0; // seconds, 0 == disabled
private static final String tag = MemCheck.class.getName();
private static TreeMap<String, RefCount> mObjectMap = new TreeMap<String, RefCount>();
private static Runnable mPeriodicRunnable = null;
private static Handler mPeriodicHandler = null;
public static void add( Object object )
{
if( !enabled )
return;
synchronized( mObjectMap )
{
String name = object.getClass().getName();
RefCount queue = mObjectMap.get( name );
if( queue == null )
{
queue = new RefCount();
mObjectMap.put( name, queue );
queue.add( object );
}
else
queue.add( object );
}
}
public static void countAndLog()
{
if( !enabled )
return;
System.gc();
Log.d( tag, "Log report starts" );
Iterator<Entry<String, RefCount>> entryIter = mObjectMap.entrySet().iterator();
while( entryIter.hasNext() )
{
Entry<String, RefCount> entry = entryIter.next();
String name = entry.getKey();
RefCount refCount = entry.getValue();
Log.d( tag, "Class " + name + " has " + refCount.countRefs() + " objects in memory." );
}
logMemoryUsage();
Log.d( tag, "Log report done" );
}
public static void logMemoryUsage()
{
if( !enabled )
return;
Runtime runtime = Runtime.getRuntime();
Log.d( tag, "Max Heap: " + runtime.maxMemory() / 1048576 + " MB, Used: " + runtime.totalMemory() / 1048576 +
" MB, Free: " + runtime.freeMemory() / 1024 + " MB" );
if( periodic > 0 )
{
if( mPeriodicRunnable != null )
mPeriodicHandler.removeCallbacks( mPeriodicRunnable );
if( mPeriodicHandler == null )
mPeriodicHandler = new Handler();
mPeriodicRunnable = new Runnable()
{
#Override
public void run()
{
mPeriodicRunnable = null;
countAndLog();
logMemoryUsage(); // this will run the next
}
};
mPeriodicHandler.postDelayed( mPeriodicRunnable, periodic * 1000 );
}
}
private static class RefCount
{
private ReferenceQueue<Object> mQueue = new ReferenceQueue<Object>();
private HashSet<Object> mRefHash = new HashSet<Object>();
private int mRefCount = 0;
public void add( Object o )
{
synchronized( this )
{
mRefHash.add( new PhantomReference<Object>( o, mQueue ) ); // Note: References MUST be kept alive for their references to be enqueued
mRefCount++;
}
}
public int countRefs()
{
synchronized( this )
{
Object ref;
while( ( ref = mQueue.poll() ) != null )
{
mRefHash.remove( ref );
mRefCount--;
}
return mRefCount;
}
}
}
}
Let me start out by saying that I'm new to Scala; however, I find the Actor based concurrency model interesting, and I tried to give it a shot for a relatively simple application. The issue that I'm running into is that, although I'm able to get the application to work, the result is far less efficient (in terms of real time, CPU time, and memory usage) than an equivalent Java based solution that uses threads that pull messages off an ArrayBlockingQueue. I'd like to understand why. I suspect that it's likely my lack of Scala knowledge, and that I'm causing all the inefficiency, but after several attempts to rework the application without success, I decided to reach out to the community for help.
My problem is this:
I have a gzipped file with many lines in the format of:
SomeID comma_separated_list_of_values
For example:
1234 12,45,82
I'd like to parse each line and get an overall count of the number of occurrences of each value in the comma separated list.
This file may be pretty large (several GB compressed), but the number of unique values per file is pretty small (at most 500). I figured this would be a pretty good opportunity to try to write an Actor-based concurrent Scala application. My solution involves a main driver that creates a pool of parser Actors. The main driver then reads lines from stdin, passes the line off to an Actor that parses the line and keeps a local count of the values. When the main driver has read the last line, it passes a message to each actor indicating that all lines have been read. When the actor receive the 'done' message, they pass their counts to an aggregator that sums the counts from all actors. Once the counts from all parsers have been aggregated, the main driver prints out the statistics.
The problem:
The main issue that I'm encountering is the incredible amount of inefficiency of this application. It uses far more CPU and far more memory than an "equivalent" Java application that uses threads and an ArrayBlockingQueue. To put this in perspective, here are some stats that I gathered for a 10 million line test input file:
Scala 1 Actor (parser):
real 9m22.297s
user 235m31.070s
sys 21m51.420s
Java 1 Thread (parser):
real 1m48.275s
user 1m58.630s
sys 0m33.540s
Scala 5 Actors:
real 2m25.267s
user 63m0.730s
sys 3m17.950s
Java 5 Threads:
real 0m24.961s
user 1m52.650s
sys 0m20.920s
In addition, top reports that the Scala application has about 10x the resident memory size. So we're talking about orders of magnitude more CPU and memory here for orders of magnitude worse performance, and I just can't figure out what is causing this. Is it a GC issue, or am I somehow creating far more copies of objects than I realize?
Additional details that may or may not be of importance:
The scala application is wrapped by a Java class so that I could
deliver a self-contained executable JAR file (I don't have the Scala
jars on every machine that I might want to run this app).
The application is being invoked as follows: gunzip -c gzFilename |
java -jar StatParser.jar
Here is the code:
Main Driver:
import scala.actors.Actor._
import scala.collection.{ immutable, mutable }
import scala.io.Source
class StatCollector (numParsers : Int ) {
private val parsers = new mutable.ArrayBuffer[StatParser]()
private val aggregator = new StatAggregator()
def generateParsers {
for ( i <- 1 to numParsers ) {
val parser = new StatParser( i, aggregator )
parser.start
parsers += parser
}
}
def readStdin {
var nextParserIdx = 0
var lineNo = 1
for ( line <- Source.stdin.getLines() ) {
parsers( nextParserIdx ) ! line
nextParserIdx += 1
if ( nextParserIdx >= numParsers ) {
nextParserIdx = 0
}
lineNo += 1
}
}
def informParsers {
for ( parser <- parsers ) {
parser ! true
}
}
def printCounts {
val countMap = aggregator.getCounts()
println( "ID,Count" )
/*
for ( key <- countMap.keySet ) {
println( key + "," + countMap.getOrElse( key, 0 ) )
//println( "Campaign '" + key + "': " + countMap.getOrElse( key, 0 ) )
}
*/
countMap.toList.sorted foreach {
case (key, value) =>
println( key + "," + value )
}
}
def processFromStdIn {
aggregator.start
generateParsers
readStdin
process
}
def process {
informParsers
var completedParserCount = aggregator.getNumParsersAggregated
while ( completedParserCount < numParsers ) {
Thread.sleep( 250 )
completedParserCount = aggregator.getNumParsersAggregated
}
printCounts
}
}
The Parser Actor:
import scala.actors.Actor
import collection.mutable.HashMap
import scala.util.matching
class StatParser( val id: Int, val aggregator: StatAggregator ) extends Actor {
private var countMap = new HashMap[String, Int]()
private val sep1 = "\t"
private val sep2 = ","
def getCounts(): HashMap[String, Int] = {
return countMap
}
def act() {
loop {
react {
case line: String =>
{
val idx = line.indexOf( sep1 )
var currentCount = 0
if ( idx > 0 ) {
val tokens = line.substring( idx + 1 ).split( sep2 )
for ( token <- tokens ) {
if ( !token.equals( "" ) ) {
currentCount = countMap.getOrElse( token, 0 )
countMap( token ) = ( 1 + currentCount )
}
}
}
}
case doneProcessing: Boolean =>
{
if ( doneProcessing ) {
// Send my stats to Aggregator
aggregator ! this
}
}
}
}
}
}
The Aggregator Actor:
import scala.actors.Actor
import collection.mutable.HashMap
class StatAggregator extends Actor {
private var countMap = new HashMap[String, Int]()
private var parsersAggregated = 0
def act() {
loop {
react {
case parser: StatParser =>
{
val cm = parser.getCounts()
for ( key <- cm.keySet ) {
val currentCount = countMap.getOrElse( key, 0 )
val incAmt = cm.getOrElse( key, 0 )
countMap( key ) = ( currentCount + incAmt )
}
parsersAggregated += 1
}
}
}
}
def getNumParsersAggregated: Int = {
return parsersAggregated
}
def getCounts(): HashMap[String, Int] = {
return countMap
}
}
Any help that could be offered in understanding what is going on here would be greatly appreciated.
Thanks in advance!
---- Edit ---
Since many people responded and asked for the Java code, here is the simple Java app that I created for comparison purposes. I realize that this is not great Java code, but when I saw the performance of the Scala application, I just whipped up something quick to see how a Java Thread-based implementation would perform as a base-line:
Parsing Thread:
import java.util.Hashtable;
import java.util.Map;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.TimeUnit;
public class JStatParser extends Thread
{
private ArrayBlockingQueue<String> queue;
private Map<String, Integer> countMap;
private boolean done;
public JStatParser( ArrayBlockingQueue<String> q )
{
super( );
queue = q;
countMap = new Hashtable<String, Integer>( );
done = false;
}
public Map<String, Integer> getCountMap( )
{
return countMap;
}
public void alldone( )
{
done = true;
}
#Override
public void run( )
{
String line = null;
while( !done || queue.size( ) > 0 )
{
try
{
// line = queue.take( );
line = queue.poll( 100, TimeUnit.MILLISECONDS );
if( line != null )
{
int idx = line.indexOf( "\t" ) + 1;
for( String token : line.substring( idx ).split( "," ) )
{
if( !token.equals( "" ) )
{
if( countMap.containsKey( token ) )
{
Integer currentCount = countMap.get( token );
currentCount++;
countMap.put( token, currentCount );
}
else
{
countMap.put( token, new Integer( 1 ) );
}
}
}
}
}
catch( InterruptedException e )
{
// TODO Auto-generated catch block
System.err.println( "Failed to get something off the queue: "
+ e.getMessage( ) );
e.printStackTrace( );
}
}
}
}
Driver:
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Hashtable;
import java.util.List;
import java.util.Map;
import java.util.TreeSet;
import java.util.concurrent.ArrayBlockingQueue;
public class JPS
{
public static void main( String[] args )
{
if( args.length <= 0 || args.length > 2 || args[0].equals( "-?" ) )
{
System.err.println( "Usage: JPS [filename]" );
System.exit( -1 );
}
int numParsers = Integer.parseInt( args[0] );
ArrayBlockingQueue<String> q = new ArrayBlockingQueue<String>( 1000 );
List<JStatParser> parsers = new ArrayList<JStatParser>( );
BufferedReader reader = null;
try
{
if( args.length == 2 )
{
reader = new BufferedReader( new FileReader( args[1] ) );
}
else
{
reader = new BufferedReader( new InputStreamReader( System.in ) );
}
for( int i = 0; i < numParsers; i++ )
{
JStatParser parser = new JStatParser( q );
parser.start( );
parsers.add( parser );
}
String line = null;
while( (line = reader.readLine( )) != null )
{
try
{
q.put( line );
}
catch( InterruptedException e )
{
// TODO Auto-generated catch block
System.err.println( "Failed to add line to q: "
+ e.getMessage( ) );
e.printStackTrace( );
}
}
// At this point, we've put everything on the queue, now we just
// need to wait for it to be processed.
while( q.size( ) > 0 )
{
try
{
Thread.sleep( 250 );
}
catch( InterruptedException e )
{
}
}
Map<String,Integer> countMap = new Hashtable<String,Integer>( );
for( JStatParser jsp : parsers )
{
jsp.alldone( );
Map<String,Integer> cm = jsp.getCountMap( );
for( String key : cm.keySet( ) )
{
if( countMap.containsKey( key ))
{
Integer currentCount = countMap.get( key );
currentCount += cm.get( key );
countMap.put( key, currentCount );
}
else
{
countMap.put( key, cm.get( key ) );
}
}
}
System.out.println( "ID,Count" );
for( String key : new TreeSet<String>(countMap.keySet( )) )
{
System.out.println( key + "," + countMap.get( key ) );
}
for( JStatParser parser : parsers )
{
try
{
parser.join( 100 );
}
catch( InterruptedException e )
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
System.exit( 0 );
}
catch( IOException e )
{
System.err.println( "Caught exception: " + e.getMessage( ) );
e.printStackTrace( );
}
}
}
I'm not sure this is a good test case for actors. For one thing, there's almost no interaction between actors. This is a simple map/reduce, which calls for parallelism, not concurrency.
The overhead on the actors is also pretty heavy, and I don't know how many actual threads are being allocated. Depending on how many processors you have, you might have less threads than on the Java program -- which seems to be the case, given that the speed-up is 4x instead of 5x.
And the way you wrote the actors is optimized for idle actors, the kind of situation where you have hundreds or thousands or actors, but only few of them doing actual work at any time. If you wrote the actors with while/receive instead of loop/react, they'd perform better.
Now, actors would make it easy to distribute the application over many computers, except that you violated one of the tenets of actors: you are calling methods on the actor object. You should never do that with actors and, in fact, Akka prevents you from doing so. A more actor-ish way of doing this would be for the aggregator to ask each actor for their key sets, compute their union, and then, for each key, ask all actors to send their count for that key.
I'm not sure, however, that the actor overhead is what you are seeing. You provided no information about the Java implementation, but I daresay you use mutable maps, and maybe even a single concurrent mutable map -- a very different implementation than what you are doing in Scala.
There's also no information on how the file is read (such a big file might have buffering issues), or how it is parsed in Java. Since most of the work is reading and parsing the file, not counting the tokens, differences in implementation there can easily overcome any other issue.
Finally, about resident memory size, Scala has a 9 MB library (in addition to what JVM brings), which might be what you are seeing. Of course, if you are using a single concurrent map in Java vs 6 immutable maps in Scala, that will certainly make a big difference in memory usage patterns.
Scala actors give way Akka actors last days... and more is coming - Viktor is hAkking further to make last the best: https://twitter.com/viktorklang/status/229694698397257728
BTW: Open Source is great power! This day should be holiday of all JVM-based community:
http://www.marketwire.com/press-release/azul-systems-announces-new-initiative-support-open-source-community-with-free-zing-jvm-1684899.htm