My objective is to count the frequencies of each word while reading a large file using multiple threads.
I am implementing Runnable interface to achieve multi-threading. But while executing the program I'm not getting the correct answer every time. Sometimes, it is giving correct output and sometimes not. But using Callable interface instead of Runnable, the program executes correctly without any error.
This is the main class:
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class WordFrequencyRunnableTest {
public static void main(String[] args) throws IOException {
long startTime = System.currentTimeMillis();
String filePath = "C:/Users/Mukesh Kumar/Desktop/data.txt";
WordFrequencyRunnableTest runnableTest = new WordFrequencyRunnableTest();
Map<String, Integer> wordFrequencies = runnableTest.parseLines(filePath);
runnableTest.printResult(wordFrequencies);
long elapsedTime = System.currentTimeMillis() - startTime;
System.out.println("Total execution time in millis: " + elapsedTime);
}
public Map<String, Integer> parseLines(String filePath) throws IOException {
Map<String, Integer> wordFrequencies = new HashMap<>();
try (BufferedReader bufferedReader = new BufferedReader(new FileReader(filePath))) {
String eachLine = bufferedReader.readLine();
while (eachLine != null) {
List<String> linesForEachThread = new ArrayList<>();
while (linesForEachThread.size() != 100 && eachLine != null) {
linesForEachThread.add(eachLine);
eachLine = bufferedReader.readLine();
}
WordFrequencyUsingRunnable task = new WordFrequencyUsingRunnable(linesForEachThread, wordFrequencies);
Thread thread = new Thread(task);
thread.start();
}
}
return wordFrequencies;
}
public void printResult(Map<String, Integer> wordFrequencies) {
wordFrequencies.forEach((key, value) -> System.out.println(key + " " + value));
}
}
And this is the logic class:
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
public class WordFrequencyUsingRunnable implements Runnable {
private final List<String> linesForEachThread;
private final Map<String, Integer> wordFrequencies;
public WordFrequencyUsingRunnable(List<String> linesForEachThread, Map<String, Integer> wordFrequencies) {
this.linesForEachThread = linesForEachThread;
this.wordFrequencies = wordFrequencies;
}
#Override
public void run() {
List<String> currentThreadLines = new ArrayList<>(linesForEachThread);
for (String eachLine : currentThreadLines) {
String[] eachLineWords = eachLine.toLowerCase().split("([,.\\s]+)");
synchronized (wordFrequencies) {
for (String eachWord : eachLineWords) {
if (wordFrequencies.containsKey(eachWord)) {
wordFrequencies.replace(eachWord, wordFrequencies.get(eachWord) + 1);
}
wordFrequencies.putIfAbsent(eachWord, 1);
}
}
}
}
}
I am hoping for good responses and thanking in advance for the help.
You should wait for all threads to close before printing the results.
public class WordFrequencyRunnableTest {
List<Thread> threads = new ArrayList<>();
public static void main(String[] args) throws IOException {
...
...
Map<String, Integer> wordFrequencies = runnableTest.parseLines(filePath);
for(Thread thread: threads)
{
thread.join();
}
runnableTest.printResult(wordFrequencies);
...
...
}
public Map<String, Integer> parseLines(String filePath) throws IOException {
Map<String, Integer> wordFrequencies = new HashMap<>();
try (BufferedReader bufferedReader = new BufferedReader(new FileReader(filePath))) {
String eachLine = bufferedReader.readLine();
while (eachLine != null) {
List<String> linesForEachThread = new ArrayList<>();
while (linesForEachThread.size() != 100 && eachLine != null) {
linesForEachThread.add(eachLine);
eachLine = bufferedReader.readLine();
}
WordFrequencyUsingRunnable task = new WordFrequencyUsingRunnable(linesForEachThread, wordFrequencies);
Thread thread = new Thread(task);
thread.start();
threads.add(thread); // Add thread to the list.
}
}
return wordFrequencies;
}
}
PS - You can use ConcurrentHashMap<String, AtomicInteger> to avoid having to synchronize access to the hashmap. The program will run faster that way.
Hi I am new to Java concurrency and I am trying to Double the List Content by fork join and dividing the task into multiple parts.
The task gets Completed but result never arrived.
package com.learning;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveTask;
import java.util.concurrent.TimeUnit;
class DoubleNumbers extends RecursiveTask<List<Integer>> {
private final List<Integer> listToDo;
public DoubleNumbers(List<Integer> list) {
System.out.println("Cons Called"+list.get(0));
this.listToDo = list;
}
#Override
protected List<Integer> compute() {
List<DoubleNumbers> doubleNumbersList= new ArrayList<>();
System.out.println(Thread.currentThread().toString());
for (int i = 0; i < listToDo.size(); i++) {
listToDo.set(i, listToDo.get(i) * 2);
}
return listToDo;
}
}
public class FJPExample {
public static void main(String[] args) {
List<Integer> arrayList = new ArrayList<>();
for (int i = 0; i < 149; i++) {
arrayList.add(i, i);
}
ForkJoinPool forkJoinPool = new ForkJoinPool(4);
System.out.println(forkJoinPool.getParallelism());
DoubleNumbers doubleNumbers = new DoubleNumbers(arrayList.subList(0, 49));
DoubleNumbers doubleNumbers50ToNext = new DoubleNumbers(arrayList.subList(50, 99));
DoubleNumbers doubleNumbers100ToNext = new DoubleNumbers(arrayList.subList(100, 149));
forkJoinPool.submit(doubleNumbers);
forkJoinPool.execute(doubleNumbers50ToNext);
forkJoinPool.execute(doubleNumbers100ToNext);
do {
System.out.println("Parallel " + forkJoinPool.getParallelism());
System.out.println("isWorking" + forkJoinPool.getRunningThreadCount());
System.out.println("isQSubmission" + forkJoinPool.getQueuedSubmissionCount());
try {
TimeUnit.SECONDS.sleep(1000);
} catch (InterruptedException e) {
//
}
} while ((!doubleNumbers.isDone()) || (!doubleNumbers50ToNext.isDone()) || (!doubleNumbers100ToNext.isDone()));
forkJoinPool.shutdown(); // Line 56
arrayList.addAll(doubleNumbers.join());
arrayList.addAll(doubleNumbers50ToNext.join());
arrayList.addAll(doubleNumbers100ToNext.join());
System.out.println(arrayList.size());
arrayList.forEach(System.out::println);
}
}
If I debug my task then I am able to find the numbers gets doubled but the result never arrived at line no 56
Issue is with the code arrayList.addAll(doubleNumbers.join()) , line# 54, 55 and 56 because this may result into ConcurrentModificationException. So, what you can do is, replace these lines with below lines and it will work(it will work because you have used arrayList.subList at line #36 which is backed by same arraylist, read its javadoc for more info)
doubleNumbers.join();
doubleNumbers50ToNext.join();
doubleNumbers100ToNext.join();
I want to execute a Kafka producer using multiple threads. Below is the code that I have tried out. I am unaware of how to implement threads in Kafka producer since I am not well versed with Thread programming.
Below is the code for my producer.
import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.Metric;
import org.apache.kafka.common.MetricName;
import org.apache.kafka.common.serialization.StringSerializer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class KafkaProducerWithThread {
//init params
final String bootstrapServer = "127.0.0.1:9092";
final String topicName = "spark-data-topic";
final String csvFileName = "unique_products.csv";
final static int MAX_THREAD = 2; //created number of threads
//Logger
final Logger logger = LoggerFactory.getLogger(KafkaProducerWithThread.class);
public KafkaProducerWithThread() throws FileNotFoundException {
}
public static void main(String[] args) throws IOException {
new KafkaProducerWithThread().runProducer();
}
public void runProducer() throws IOException {
//Read the CSV file from Resources folder as BufferedReader
ClassLoader classLoader = new KafkaProducerWithThread().getClass().getClassLoader();
BufferedReader reader = new BufferedReader(new FileReader(classLoader.getResource(csvFileName).getFile()));
//Create a Kafka Producer
org.apache.kafka.clients.producer.KafkaProducer<String, String> producer = createKafkaProducer();
//Kafka Producer Metrics
Metric requestTotalMetric = null;
for (Map.Entry<MetricName, ? extends Metric> entry : producer.metrics().entrySet()) {
if ("request-total".equals(entry.getKey().name())) {
requestTotalMetric = entry.getValue();
}
}
//Thread
ExecutorService executorService = Executors.newFixedThreadPool(MAX_THREAD);
//Read the CSV file line by line
String line = "";
int i = 0;
while ((line = reader.readLine()) != null) {
i++;
String key = "products_" + i;
//Create a ProducerRecord
ProducerRecord<String, String> csvProducerRecord = new ProducerRecord<>(topicName, key, line.trim());
//Send the data - Asynchronously
producer.send(csvProducerRecord, new Callback() {
#Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
//executes every time a record is sent successfully or an exception is thrown
if (e == null) {
//the record was sent successfully
// logger.info("Received new metadata. \n" +
// "Topic: " + recordMetadata.topic() + "\n" +
// "Partition: " + recordMetadata.partition() + "\n" +
// "Offset: " + recordMetadata.offset() + "\n" +
// "Timestamp: " + recordMetadata.timestamp());
} else {
logger.error("Error while producing", e);
}
}
});
if (i % 1000 == 0){
logger.info("Record #: " + i + " Request rate: " + requestTotalMetric.metricValue());
}
}
//Adding a shutdown hook
Runtime.getRuntime().addShutdownHook(new Thread(() -> {
logger.info("Stopping the Producer!");
producer.flush();
producer.close();
logger.info("Stopped the Producer!");
}));
}
public org.apache.kafka.clients.producer.KafkaProducer<String, String> createKafkaProducer() {
//Create Producer Properties
Properties properties = new Properties();
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
properties.setProperty(ProducerConfig.ACKS_CONFIG, "all");
properties.setProperty(ProducerConfig.RETRIES_CONFIG, "5");
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true"); // For an idempotent producer
//kafka can detect whether it's a duplicate data based on the producer request id.
//Create high throughput Producer at the expense of latency & CPU
properties.setProperty(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
properties.setProperty(ProducerConfig.LINGER_MS_CONFIG, "60");
properties.setProperty(ProducerConfig.BATCH_SIZE_CONFIG, Integer.toString(32 * 1024)); //32KB batch size
//Create Kafka Producer
org.apache.kafka.clients.producer.KafkaProducer<String, String> csvProducer = new org.apache.kafka.clients.producer.KafkaProducer<String, String>(properties);
return csvProducer;
}
}
Can anyone help me in implementing the threads in my Kafka producer program?
My Producer will be producing over a million records & so I want to implement threads for the same. I am aware of ExecutorService used for thread programming but I am not sure how to implement in this case.
Thanks.
create a MessageSender class as given below.
after creating the producer class, create a new MesssageSender object taking the producer record and producer as constructor args.
invoke executorService.submit() to perform the task.
class Producer {
ExecutorService executorService =
Executors.newFixedThreadPool(MAX_THREAD);
//Read the CSV file line by line
String line = "";
int i = 0;
while ((line = reader.readLine()) != null) {
//create produver record
ProducerRecord<String, String> csvProducerRecord = new ProducerRecord<>(topicName, key, line.trim());
MessageSender sendMessage= new MessageSender(csvProducerRecord,producer);
executorService.submit()...
}
}
//Thread class
class MessageSender implements Runnable<>{
MessageSender(Producerrecord,producer{
//store in class level variable in thread class
}
public void run(){
producer.send(csvProducerRecord...);
}
Below code tests the response time of reading www.google.com into a BufferedReader. I plan on using this code to test the response times of other sites and web services within intranet. Below tests runs for 20 seconds and opens 4 requests per second :
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.Map.Entry;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
import org.junit.Test;
public class ResponseTimeTest {
private static final int NUMBER_REQUESTS_PER_SECOND = 4;
private static final int TEST_EXECUTION_TIME = 20000;
private static final ConcurrentHashMap<Long, Long> timingMap = new ConcurrentHashMap<Long, Long>();
#Test
public void testResponseTime() throws InterruptedException {
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(10);
scheduler.scheduleAtFixedRate(new RequestThreadCreator(), 0, 1, TimeUnit.SECONDS);
Thread.sleep(TEST_EXECUTION_TIME);
System.out.println("Start Time, End Time, Total Time");
for (Entry<Long, Long> entry : timingMap.entrySet())
{
System.out.println(entry.getKey() + "," + entry.getValue() +","+(entry.getValue() - entry.getKey()));
}
}
private final class RequestThreadCreator implements Runnable {
public void run() {
ExecutorService es = Executors.newCachedThreadPool();
for (int i = 1; i <= NUMBER_REQUESTS_PER_SECOND; i++) {
es.execute(new RequestThread());
}
es.shutdown();
}
}
private final class RequestThread implements Runnable {
public void run() {
long startTime = System.currentTimeMillis();
try {
URL oracle = new URL("http://www.google.com/");
URLConnection yc = oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
while ((in.readLine()) != null) {
}
in.close();
} catch (Exception e) {
e.printStackTrace();
}
long endTime = System.currentTimeMillis();
timingMap.put(startTime, endTime);
}
}
}
The output is :
Start Time, End Time, Total Time
1417692221531,1417692221956,425
1417692213530,1417692213869,339
1417692224530,1417692224983,453
1417692210534,1417692210899,365
1417692214530,1417692214957,427
1417692220530,1417692221041,511
1417692209530,1417692209949,419
1417692215532,1417692215950,418
1417692214533,1417692215075,542
1417692213531,1417692213897,366
1417692212530,1417692212924,394
1417692219530,1417692219897,367
1417692226532,1417692226876,344
1417692211530,1417692211955,425
1417692209529,1417692209987,458
1417692222531,1417692222967,436
1417692215533,1417692215904,371
1417692219531,1417692219954,423
1417692215530,1417692215870,340
1417692217531,1417692218035,504
1417692207547,1417692207882,335
1417692208535,1417692208898,363
1417692207544,1417692208095,551
1417692208537,1417692208958,421
1417692226533,1417692226899,366
1417692224531,1417692224951,420
1417692225529,1417692225957,428
1417692216530,1417692216963,433
1417692223541,1417692223884,343
1417692223546,1417692223959,413
1417692222530,1417692222954,424
1417692208532,1417692208871,339
1417692207536,1417692207988,452
1417692226538,1417692226955,417
1417692220531,1417692220992,461
1417692209531,1417692209953,422
1417692226531,1417692226959,428
1417692217532,1417692217944,412
1417692210533,1417692210964,431
1417692221530,1417692221870,340
1417692216531,1417692216959,428
1417692207535,1417692208021,486
1417692223548,1417692223957,409
1417692216532,1417692216904,372
1417692214535,1417692215071,536
1417692217530,1417692217835,305
1417692213529,1417692213954,425
1417692210531,1417692210964,433
1417692212529,1417692212993,464
1417692213532,1417692213954,422
1417692215531,1417692215957,426
1417692210529,1417692210868,339
1417692218531,1417692219102,571
1417692225530,1417692225907,377
1417692208536,1417692208966,430
1417692218533,1417692219168,635
As System.out.println is synchronized in order to not skew results I add the timings to a ConcurrentHashMap and do not output the timings within the RequestThread itself. Are other gotcha's I should be aware of in above code so as to not skew the results. Or are there area's I should concentrate on in order to improve the accuracy or is it accurate "enough", by enough accurate to approx 100 millliseconds.
sorry I am a newbie I have been looking for an example on how I can iterate hashmap and store the entities into MySQL database. the blow code downloads currency rate which I would like to store into my database. the output is
{CCY=USD, RATE=1.5875}
{CCY=EUR, RATE=1.1919}
{CCY=ALL, RATE=166.2959}
{CCY=AMD, RATE=645.4025}
how can I iterate the HashMap and store it into my database? an illustration would be nice or source similar to this scenario.
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
public class Rate {
/** list of string array containing IOSCurrencyCodes*/
final String[] CURRENCY = new String[] { "USD","EUR","ALL","AMD",};
#SuppressWarnings("rawtypes")
void checkRateAllAtEnd() throws Exception {
List<Callable<HashMap>> tasks = new ArrayList<Callable<HashMap>>();
for (final String ccy : CURRENCY) {
tasks.add(new Callable<HashMap>() {
public HashMap<String, Comparable> call() throws Exception {
return getRates(ccy);
}
});
}
ExecutorService executorPool = Executors.newCachedThreadPool();
final List<Future<HashMap>> listRates = executorPool.invokeAll(tasks, 3600, TimeUnit.SECONDS);
for (Future<HashMap> rate : listRates) {
HashMap ccyRate = rate.get();
System.out.println(ccyRate);
}
}
#SuppressWarnings("rawtypes")
public HashMap<String, Comparable> getRates(String ccy) throws Exception {
URL url = new URL("http://finance.yahoo.com/d/quotes.csv?e=.csv&f=sl1d1t1&s=GBP"
+ ccy + "=X");
BufferedReader reader = new BufferedReader(new InputStreamReader(
url.openStream()));
String data = reader.readLine();
String[] dataItems = data.split(",");
Double rate = Double.valueOf(dataItems[1]);
HashMap<String, Comparable> ccyRate = new HashMap<String, Comparable>();
ccyRate.put("CCY", ccy);
ccyRate.put("RATE", rate);
return ccyRate;
}
public static void main(String[] args) {
Rate ccyRate = new Rate();
try {
//ccyConverter.checkRateSequential();
ccyRate.checkRateAllAtEnd();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Map provides entrySet method which is very useful to iterate over a map.
Map#entrySet returns a Set view of the mappings contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress.
ref - link
Code -
HashMap<String, Comparable> ccyRate = new HashMap<String, Comparable>();
for(Map.Entry<String, Comparable> entry: ccyRate.entrySet){
System.out.println(entry.getKey()+" - "+entry.getValue());
}