Spring batch : Job instances run sequentially when using annotaitons

Spring batch : Job instances run sequentially when using annotaitons - java

I have a simple annotation configuration for a Spring batch job as follows :
#Configuration
#EnableBatchProcessing
public abstract class AbstractFileLoader<T> {
private static final String FILE_PATTERN = "*.dat";
#Bean
#StepScope
#Value("#{stepExecutionContext['fileName']}")
public FlatFileItemReader<T> reader(String file) {
FlatFileItemReader<T> reader = new FlatFileItemReader<T>();
String path = file.substring(file.indexOf(":") + 1, file.length());
FileSystemResource resource = new FileSystemResource(path);
reader.setResource(resource);
DefaultLineMapper<T> lineMapper = new DefaultLineMapper<T>();
lineMapper.setFieldSetMapper(getFieldSetMapper());
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer(",");
tokenizer.setNames(getColumnNames());
lineMapper.setLineTokenizer(tokenizer);
reader.setLineMapper(lineMapper);
reader.setLinesToSkip(1);
return reader;
}
#Bean
public ItemProcessor<T, T> processor() {
// TODO add transformations here
return null;
}
//Exception when using JobScope for the writer
#Bean
public ItemWriter<T> writer() {
ListItemWriter<T> writer = new ListItemWriter<T>();
return writer;
}
#Bean
public Job loaderJob(JobBuilderFactory jobs, Step s1,
JobExecutionListener listener) {
return jobs.get(getLoaderName()).incrementer(new RunIdIncrementer())
.listener(listener).start(s1).build();
}
#Bean
public Step readStep(StepBuilderFactory stepBuilderFactory,
ItemReader<T> reader, ItemWriter<T> writer,
ItemProcessor<T, T> processor, TaskExecutor taskExecutor,
ResourcePatternResolver resolver) {
final Step readerStep = stepBuilderFactory
.get(getLoaderName() + " ReadStep:slave").<T, T> chunk(25254)
.reader(reader).processor(processor).writer(writer)
.taskExecutor(taskExecutor).throttleLimit(16).build();
final Step partitionedStep = stepBuilderFactory
.get(getLoaderName() + " ReadStep:master")
.partitioner(readerStep)
.partitioner(getLoaderName() + " ReadStep:slave",
partitioner(resolver)).taskExecutor(taskExecutor)
.build();
return partitionedStep;
}
#Bean
public TaskExecutor taskExecutor() {
return new SimpleAsyncTaskExecutor();
}
#Bean
public Partitioner partitioner(
ResourcePatternResolver resourcePatternResolver) {
MultiResourcePartitioner partitioner = new MultiResourcePartitioner();
Resource[] resources;
try {
resources = resourcePatternResolver.getResources("file:"
+ getFilesPath() + FILE_PATTERN);
} catch (IOException e) {
throw new RuntimeException(
"I/O problems when resolving the input file pattern.", e);
}
partitioner.setResources(resources);
return partitioner;
}
#Bean
public JobExecutionListener listener(ItemWriter<T> writer) {
/* org.springframework.batch.core.scope.StepScope scope; */
return new JobCompletionNotificationListener<T>(writer);
}
public abstract FieldSetMapper<T> getFieldSetMapper();
public abstract String getFilesPath();
public abstract String getLoaderName();
public abstract String[] getColumnNames();
}
When I run the same instance of the job with two different job parameters, both instances run sequentially instead of running in parallel. I have a SimpleAysncTaskExecutor bean configured which I assume should cause the jobs to be triggered asynchronously.
Do I need to add any more configuration to this class to have the job instances execute in parallel?

You have to configure the jobLauncher that you're using to launch jobs to use your TaskExecutor (or a separate pool). The simplest way is to override the bean:
#Bean
JobLauncher jobLauncher(JobRepository jobRepository) {
new SimpleJobLauncher(
taskExecutor: taskExecutor(),
jobRepository: jobRepository)
}
Don't be confused by the warning that will be logged saying that a synchronous task executor will be used. This is due to an extra instance that is created owing to the very awkward way Spring Batch uses to configure the beans it provides in SimpleBatchConfiguration (long story short, if you want to get rid of the warning you'll need to provide a BatchConfigurer bean and specify how 4 other beans are to be created, even if you want to change just one).
Note that it being the same job is irrelevant here. The problem is that by default the job launcher will launch the job on the same thread.

Related

Control is not going inside #KafkaListner method and no message is printing in java console

I have created messaging component which will be called by other service for consuming and sending message from kafka, producer part is working fine, I am not sure what wrong with the below consumer listner part why it not printing messages or in debug mode control also not going inside the #kafkaListner method, but GUI based kafkamanager app shows offset is got committed even thought its mannual offset commit.
Here is my Message listner class code , I have checked topic and groupid is setting and fetched properly
#Component
public class SpringKafkaMessageListner {
public CountDownLatch latch = new CountDownLatch(1);
#KafkaListener(topics = "#{consumerFactory.getConfigurationProperties().get(\"topic-name\")}",
groupId = "#{consumerFactory.getConfigurationProperties().get(\"group.id\")}",
containerFactory = "springKafkaListenerContainerFactory")
public void listen(ConsumerRecord<?, ?> consumerRecord, Acknowledgment ack) {
System.out.println("listening...");
System.out.println("Received Message in group : "
+ " and message: " + consumerRecord.value());
System.out.println("current offsetId : " + consumerRecord.offset());
ack.acknowledge();
latch.countDown();
}
}
Consumer config class-
#Configuration
#EnableKafka
public class KafkaConsumerBeanConfig<T> {
#Autowired
#Lazy
private KafkaConsumerConfigDTO kafkaConsumerConfigDTO;
#Bean
public ConsumerFactory<Object, T> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(kafkaConsumerConfigDTO.getConfigs());
}
//for spring kafka with manual offset commit
#Bean
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<Object,
springKafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<Object, T> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
//manual commit
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE);
return factory;
}
#Bean
SpringKafkaMessageListner consumerListner(){
return new SpringKafkaMessageListner();
}
}
Below code snippet is consumer interface implementation which expose subscribe() method and all other bean creation is done thru ConfigurableApplicationContext.
public class SpringKafkaConsumer<T> implements Consumer<T> {
public SpringKafkaConsumer(ConsumerConfig<T> consumerConfig,
ConfigurableApplicationContext context) {
this.consumerConfig = consumerConfig;
this.context = context;
this.consumerFactory = context.getBean("consumerFactory", ConsumerFactory.class);
this.springKafkaContainer = context.getBean("springKafkaListenerContainerFactory",
ConcurrentKafkaListenerContainerFactory.class);
}
// here is it just simple code to initialize SpringKafkaMessageListner class and invoking
listening part
#Override
public void subscribe() {
consumerListner = context.getBean("consumerListner", SpringKafkaMessageListner.class);
try {
consumerListner.latch.await(30, TimeUnit.SECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
Test class with my local docker kafka setup
#RunWith(SpringRunner.class)
#DirtiesContext
#ContextConfiguration(classes = QueueManagerSpringConfig.class)
public class SpringKafkaTest extends AbstractJUnit4SpringContextTests {
#Autowired
private QueueManager queueManager;
private Consumer<KafkaMessage> consumer;`
// test method
#Test
public void testSubscribeWithLocalBroker() {
String topicName = "topic1";
String brokerServer = "127.0.0.1:9092";
String groupId = "grp1";
Map<String, String> additionalProp = new HashMap<>();
additionalProp.put(KafkaConsumerConfig.GROUP_ID, groupId);
additionalProp.put(KafkaConsumerConfig.AUTO_COMMIT, "false");
additionalProp.put(KafkaConsumerConfig.AUTO_COMMIT_INTERVAL, "100");
ConsumerConfig<KafkaMessage> consumerConfig =
new ConsumerConfig.Builder<>(topicName, new KafkaSuccessMessageHandler(new
KafkaMessageSerializerTest()),
new KafkaMessageDeserializerTest())
.additionalProperties(additionalProp)
.enableSpringKafka(true)
.offsetPositionStrategy(new EarliestPositionStrategy())
.build();
consumer = queueManager.getConsumer(consumerConfig);
System.out.println("start subscriber");
// calling subcribe method of consumer that will invoke kafkalistner
consumer.subscribe();
}
#Configuration
public class QueueManagerSpringConfig {
#Bean
public QueueManager queueManager() {
Map<String, String> kafkaProperties = new HashMap<>();
kafkaProperties.put(KafkaPropertyNamespace.NS_PREFIX +
KafkaPropertyNamespace.BOOTSTRAP_SERVERS,
"127.0.0.1:9092");
return QueueManagerFactory.getInstance(new KafkaPropertyNamespace(kafkaProperties)); } }

Why does my spring batch takes step2 while step1 is still going on?

First of all, thank you for checking out my post.
I would list the techs that are used, what I need to achieve and what is happening.
Services used:
Spring Boot in Groovy
AWS: AWS Batch, S3Bucket, SNS, Parameter Store, CloudFormation
What I need to achieve:
Read two csv files from S3 Bucket(student.csv: id,studentName are columns and score.csv: id,studentId,score)
Check who is failed by comparing two files. If student’s score is below 50, he/she is failed.
Create a new csv file in S3 Bucket and store it as failedStudents.csv.
Send an email via SNS topic the failedStudents.csv that was just created.
In the Spring Batch Config class, I am calling the 1,2,3 as step1 and 4 as step2.
#Bean
Step step1() {
return this.stepBuilderFactory
.get("step1")
.<BadStudent, BadStudent>chunk(100)
.reader(new IteratorItemReader<Student>(this.StudentLoader.Students.iterator()) as ItemReader<? extends BadStudent>)
.processor(this.studentProcessor as ItemProcessor<? super BadStudent, ? extends BadStudent>)
.writer(this.csvWriter())
.build()
}
#Bean
Step step2() {
return this.stepBuilderFactory
.get("step2")
.tasklet(new PublishSnsTopic())
.build()
}
#Bean
Job job() {
return this.jobBuilderFactory
.get("scoring-students-batch")
.incrementer(new RunIdIncrementer())
.start(this.step1())
.next(this.step2())
.build()
}
ItemWriter
#Component
#EnableContextResourceLoader
class BadStudentWriter implements ItemWriter<BadStudent> {
#Autowired
ResourceLoader resourceLoader
#Resource
FileProperties fileProperties
WritableResource resource
PrintStream writer
CSVPrinter csvPrinter
#PostConstruct
void setup() {
this.resource = this.resourceLoader.getResource("s3://students/failedStudents.csv") as WritableResource
this.writer = new PrintStream(this.resource.outputStream)
this.csvPrinter = new CSVPrinter(this.writer, CSVFormat.DEFAULT.withDelimiter('|' as char).withHeader('id', 'studentsName', 'score'))
}
#Override
void write(List<? extends BadStudent> items) throws Exception {
this.csvPrinter.with {CSVPrinter csvPrinter ->
items.each {BadStudent badStudent ->
csvPrinter.printRecord(
badStudent.id,
badStudent.studentsName,
badStudent.score
)
}
}
}
#AfterStep
void aferStep() {
this.csvPrinter.close()
}
}
PublishSnsTopic
#Configuration
#Service
class PublishSnsTopic implements Tasklet {
#Autowired
ResourceLoader resourceLoader
List<BadStudent> badStudents
#Autowired
FileProperties fileProperties
#PostConstruct
void setup() {
String badStudentCSVFileName = "s3://students/failedStudents.csv"
Reader badStudentReader = new InputStreamReader(
this.resourceLoader.getResource(badStudentCSVFileName).inputStream
)
this.badStudnets = new CsvToBeanBuilder(badStudentReader)
.withSeparator((char)'|')
.withType(BadStudent)
.withFieldAsNull(CSVReaderNullFieldIndicator.BOTH)
.build()
.parse()
String messageBody = ""
messageBody += this.badStudents.collect {it -> return "${it.id}"}
SnsClient snsClient = SnsClient.builder().build()
if(snsClient) {
publishTopic(snsClient, messageBody, this.fileProperties.topicArn)
snsClient.close()
}
}
void publishTopic(SnsClient snsClient, String message, String arn) {
try {
PublishRequest request = PublishRequest.builder()
.message(message)
.topicArn(arn)
.build()
PublishResponse result = snsClient.publish(request)
} catch (SnsException e) {
log.error "SOMETHING WENT WRONG"
}
}
#Override
RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
return RepeatStatus.FINISHED
}
}
The Issue here is that, while step 1 is executing step2 also gets executed.
And it would lead to serious problem as
As this batch job runs everyday. Let’s say today is Tuesday, the failedStudents.csv is generated and stored fine. However while it is also running step2 instead of waiting for step1 to be done, step2 is send the failedStudents.csv that is generated on Monday. So csv is generated as new dataset. But Step2 is sending a day old students list.
For some reason, if failedStudents.csv is not stored in the s3 or turned into Glacier bucket, it would failed the batch job. Because in the Step2 it would crash due to FileNotFound Exception while Step1 is still going on and the failedStudents.csv is not created yet.
This is my first AWS and Spring Batch Project and so far, Step1 is working great perfectly. But I am guessing Step2 has a problem,
I appreciate reading my long post.
I would be very happy if anyone could answer or help me with this problem.
I have been fixing PublishSnsTopic and ItemWriter, but this is where I am stuck for a while, not sure how to figure this out.
Thank you so much again for reading.

all itemreader from each job are initliazed at startup

I have 2 jobs each one with 2 steps (each one with reader, processor, writer).
All is working well but when i launch job N°1 (command line with --spring.batch.job.names=job1Name), all the IteamReader are called (ItemReader from job N°1 and job N°2)
Log look like this :
start reader 1
start reader 2
start reader 3
start reader 4
From this Code (very simplified) for job 1 :
#Configuration
public class Job1Class
{
...
#Bean
public #NonNull Job job1(){
return jobBuilder.get("job1Name")
.start(step1())
.next(step2())
.build();
}
#Bean
public #NonNull Step step1()
{
return stepBuilder.get("step1")
.<MyClass, MyClass>chunk(1024)
.reader(reader1())
.processor(processor1())
.writer(writer1())
.build();
}
#Bean
public #NonNull Step step2()
{
return stepBuilder.get("step2")
.<MyClass, MyClass>chunk(1024)
.reader(reader2())
.processor(processor2())
.writer(writer2())
.build();
}
#Bean
public #NonNull ItemReader<MyClass> reader1()
{
log.debug("start reader 1");
//code
}
#Bean
public #NonNull ItemReader<MyClass> reader2()
{
log.debug("start reader 2");
//code
}
...
}
and the same for job2 :
#Configuration
public class Job2Class
{
...
#Bean
public #NonNull Job job2(){
return jobBuilder.get("job2Name")
.start(step3())
.next(step4())
.build();
}
#Bean
public #NonNull Step step3()
{
return stepBuilder.get("step3")
.<MyClass, MyClass>chunk(1024)
.reader(reader3())
.processor(processor3())
.writer(writer3())
.build();
}
#Bean
public #NonNull Step step4()
{
return stepBuilder.get("step4")
.<MyClass, MyClass>chunk(1024)
.reader(reader4())
.processor(processor4())
.writer(writer4())
.build();
}
#Bean
public #NonNull ItemReader<MyClass> reader3()
{
log.debug("start reader 3");
//code
}
#Bean
public #NonNull ItemReader<MyClass> reader4()
{
log.debug("start reader 4");
//code
}
...
}
I'm missing something ?
Thanks for your help.

When you start your Spring Boot application, all beans will be created and added to the application context (ie the bean definition methods will be called), that's why you see the log messages. But that does not mean all readers will be executed, only those of the specific job will be called at runtime.

Spring Kafka ChainedKafkaTransactionManager doesn't synchronize with JPA Spring-data transaction

I read a ton of Gary Russell answers and posts, but didn't find actual solution for the common use-case for synchronization of the sequence below:
recieve from topic A => save to DB via Spring-data => send to topic B
As i understand properly: there is no guarantee for fully atomic processing in that case and i need to deal with messages deduplication on the client side, but the main issue is that ChainedKafkaTransactionManager doesn't synchronize with JpaTransactionManager (see #KafkaListener below)
Kafka config:
#Production
#EnableKafka
#Configuration
#EnableTransactionManagement
public class KafkaConfig {
private static final Logger log = LoggerFactory.getLogger(KafkaConfig.class);
#Bean
public ConsumerFactory<String, byte[]> commonConsumerFactory(#Value("${kafka.broker}") String bootstrapServer) {
Map<String, Object> props = new HashMap<>();
props.put(BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
props.put(AUTO_OFFSET_RESET_CONFIG, 'earliest');
props.put(SESSION_TIMEOUT_MS_CONFIG, 10000);
props.put(ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(MAX_POLL_RECORDS_CONFIG, 10);
props.put(MAX_POLL_INTERVAL_MS_CONFIG, 17000);
props.put(FETCH_MIN_BYTES_CONFIG, 1048576);
props.put(FETCH_MAX_WAIT_MS_CONFIG, 1000);
props.put(ISOLATION_LEVEL_CONFIG, 'read_committed');
props.put(KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(VALUE_DESERIALIZER_CLASS_CONFIG, ByteArrayDeserializer.class);
return new DefaultKafkaConsumerFactory<>(props);
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, byte[]> kafkaListenerContainerFactory(
#Qualifier("commonConsumerFactory") ConsumerFactory<String, byte[]> consumerFactory,
#Qualifier("chainedKafkaTM") ChainedKafkaTransactionManager chainedKafkaTM,
#Qualifier("kafkaTemplate") KafkaTemplate<String, byte[]> kafkaTemplate,
#Value("${kafka.concurrency:#{T(java.lang.Runtime).getRuntime().availableProcessors()}}") Integer concurrency
) {
ConcurrentKafkaListenerContainerFactory<String, byte[]> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.getContainerProperties().setMissingTopicsFatal(false);
factory.getContainerProperties().setTransactionManager(chainedKafkaTM);
factory.setConsumerFactory(consumerFactory);
factory.setBatchListener(true);
var arbp = new DefaultAfterRollbackProcessor<String, byte[]>(new FixedBackOff(1000L, 3));
arbp.setCommitRecovered(true);
arbp.setKafkaTemplate(kafkaTemplate);
factory.setAfterRollbackProcessor(arbp);
factory.setConcurrency(concurrency);
factory.afterPropertiesSet();
return factory;
}
#Bean
public ProducerFactory<String, byte[]> producerFactory(#Value("${kafka.broker}") String bootstrapServer) {
Map<String, Object> configProps = new HashMap<>();
configProps.put(BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
configProps.put(BATCH_SIZE_CONFIG, 16384);
configProps.put(ENABLE_IDEMPOTENCE_CONFIG, true);
configProps.put(KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
configProps.put(VALUE_SERIALIZER_CLASS_CONFIG, ByteArraySerializer.class);
var kafkaProducerFactory = new DefaultKafkaProducerFactory<String, byte[]>(configProps);
kafkaProducerFactory.setTransactionIdPrefix('kafka-tx-');
return kafkaProducerFactory;
}
#Bean
public KafkaTemplate<String, byte[]> kafkaTemplate(#Qualifier("producerFactory") ProducerFactory<String, byte[]> producerFactory) {
return new KafkaTemplate<>(producerFactory);
}
#Bean
public KafkaTransactionManager kafkaTransactionManager(#Qualifier("producerFactory") ProducerFactory<String, byte[]> producerFactory) {
KafkaTransactionManager ktm = new KafkaTransactionManager<>(producerFactory);
ktm.setTransactionSynchronization(SYNCHRONIZATION_ON_ACTUAL_TRANSACTION);
return ktm;
}
#Bean
public ChainedKafkaTransactionManager chainedKafkaTM(JpaTransactionManager jpaTransactionManager,
KafkaTransactionManager kafkaTransactionManager) {
return new ChainedKafkaTransactionManager(kafkaTransactionManager, jpaTransactionManager);
}
#Bean(name = "transactionManager")
public JpaTransactionManager transactionManager(EntityManagerFactory em) {
return new JpaTransactionManager(em);
}
}
Kafka listener:
#KafkaListener(groupId = "${group.id}", idIsGroup = false, topics = "${topic.name.import}")
public void consume(List<byte[]> records, #Header(KafkaHeaders.OFFSET) Long offset) {
for (byte[] record : records) {
// cause infinity rollback (perhaps due to batch listener)
if (true)
throw new RuntimeExcetion("foo");
// spring-data storage with #Transactional("chainedKafkaTM"), since Spring-data can't determine TM among transactionManager, chainedKafkaTM, kafkaTransactionManager
var result = storageService.persist(record);
kafkaTemplate.send(result);
}
}
Spring-kafka version: 2.3.3
Spring-boot version: 2.2.1
What is a proper way to implement such use-case ?
Spring-kafka documentation limited only to small/specific examples.
P.s. when i'm using #Transactional(transactionManager = "chainedKafkaTM", rollbackFor = Exception.class) on #KafkaListener method i am facing endless cyclic rollback, however FixedBackOff(1000L, 3L) is set.
EDIT: i'm planning to achieve max affordable synchronization between listener, producer and database with configurable retries num.
EDIT: Code snippets above edited with respect to advised configuration. Using ARBP doesn't solve infinity rollback cycle for me, since the first statement's predicate is always false (SeekUtils.doSeeks):
DefaultAfterRollbackProcessor
...
#Override
public void process(List<ConsumerRecord<K, V>> records, Consumer<K, V> consumer, Exception exception,
boolean recoverable) {
if (SeekUtils.doSeeks(((List) records), consumer, exception, recoverable,
getSkipPredicate((List) records, exception), LOGGER)
&& isCommitRecovered() && this.kafkaTemplate != null && this.kafkaTemplate.isTransactional()) {
ConsumerRecord<K, V> skipped = records.get(0);
this.kafkaTemplate.sendOffsetsToTransaction(
Collections.singletonMap(new TopicPartition(skipped.topic(), skipped.partition()),
new OffsetAndMetadata(skipped.offset() + 1)));
}
}
It is worth saying that there is no active transaction in Kafka Consumer method (TransactionSynchronizationManager.isActualTransactionActive()).

What makes you think it's not synchronized? You really don't need #Transactional since the container will start both transactions.
You shouldn't use a SeekToCurrentErrorHandler with transactions because that occurs within the transaction. Configure the after rollback processor instead. The default ARBP uses a FixedBackOff(0L, 9) (10 attempts).
This works fine for me; and stops after 4 delivery attempts:
#SpringBootApplication
public class So58804826Application {
public static void main(String[] args) {
SpringApplication.run(So58804826Application.class, args);
}
#Bean
public JpaTransactionManager transactionManager() {
return new JpaTransactionManager();
}
#Bean
public ChainedKafkaTransactionManager<?, ?> chainedTxM(JpaTransactionManager jpa,
KafkaTransactionManager<?, ?> kafka) {
kafka.setTransactionSynchronization(SYNCHRONIZATION_ON_ACTUAL_TRANSACTION);
return new ChainedKafkaTransactionManager<>(kafka, jpa);
}
#Autowired
private Saver saver;
#KafkaListener(id = "so58804826", topics = "so58804826")
public void listen(String in) {
System.out.println("Storing: " + in);
this.saver.save(in);
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("so58804826")
.partitions(1)
.replicas(1)
.build();
}
#Bean
public ApplicationRunner runner(KafkaTemplate<String, String> template) {
return args -> {
// template.executeInTransaction(t -> t.send("so58804826", "foo"));
};
}
}
#Component
class ContainerFactoryConfigurer {
ContainerFactoryConfigurer(ConcurrentKafkaListenerContainerFactory<?, ?> factory,
ChainedKafkaTransactionManager<?, ?> tm) {
factory.getContainerProperties().setTransactionManager(tm);
factory.setAfterRollbackProcessor(new DefaultAfterRollbackProcessor<>(new FixedBackOff(1000L, 3)));
}
}
#Component
class Saver {
#Autowired
private MyEntityRepo repo;
private final AtomicInteger ids = new AtomicInteger();
#Transactional("chainedTxM")
public void save(String in) {
this.repo.save(new MyEntity(in, this.ids.incrementAndGet()));
throw new RuntimeException("foo");
}
}
I see "Participating in existing transaction" from both TxMs.
and with #Transactional("transactionManager"), I just get it from the JPATm, as one would expect.
EDIT
There is no concept of "recovery" for a batch listener - the framework has no idea which record in the batch needs to be skipped. In 2.3, we added a new feature for batch listeners when using MANUAL ack modes.
See Committing Offsets.
Starting with version 2.3, the Acknowledgment interface has two additional methods nack(long sleep) and nack(int index, long sleep). The first one is used with a record listener, the second with a batch listener. Calling the wrong method for your listener type will throw an IllegalStateException.
When using a batch listener, you can specify the index within the batch where the failure occurred. When nack() is called, offsets will be committed for records before the index and seeks are performed on the partitions for the failed and discarded records so that they will be redelivered on the next poll(). This is an improvement over the SeekToCurrentBatchErrorHandler, which can only seek the entire batch for redelivery.
However, the failed record will still be replayed indefinitely.
You could keep track of the record that keeps failing and nack index + 1 to skip over it.
However, since your JPA tx has rolled back; this won't work for you.
With batch listener's you must handle problems with batches in your listener code.

SELECT FOR UPDATE query with Spring's #Transaction management creates deadlock upon subsequent updates

Setup: Spring application deployed on Weblogic 12c, using JNDI lookup to get a datasource to the Oracle Database.
We have multiple services which will be polling the database regularly for new jobs. In order to prevent two services picking the same job we are using a native SELECT FOR UPDATE query in a CrudRepository. The application then takes the resulting job and updates it to PROCESSING instead of WAITING using the CrusRepository.save() method.
The problem is that I can't seem to get the save() to work within the FOR UPDATE transaction (at least this is my current working theory of what goes wrong), and as a result the entire polling freezes until the default 10 minute timeout occurs. I have tried putting #Transactional (with various propagation flags) basically everywhere, but I'm not able to get it to work (#EnableTransactionManagement is activated and working).
Obviously there must be some basic knowledge I'm missing. Is this even a possible setup? Unfortunately, just using #Transactional with a non-native CrudRepository SELECT query is not possible, as it apparently first makes a SELECT to see if the row is locked or not, and only then makes a new SELECT that locks it. Another service could very well pick up the same job in the meanwhile, which is why we need it to lock immediately.
Update in relation to #M. Deinum's comment.: I should perhaps also mention that it's a setup wherein the central component that's doing the polling is a library used by all the other services (therefore the library has #SpringBootApplication, as does each service using it, so double component scanning is certainly present). Furthermore, the service has two separate classes for polling depending on the type of service, with a lot of common code, shared in an AbstractTransactionHelper class. Below I've aggregated some code for the sake of brevity.
The library's main class:
#SpringBootApplication
#EnableTransactionManagement
#EnableJpaRepositories
public class JobsMain {
public static void initializeJobsMain(){
PersistenceProviderResolverHolder.setPersistenceProviderResolver(new PersistenceProviderResolver() {
#Override
public List<PersistenceProvider> getPersistenceProviders() {
return Collections.singletonList(new HibernatePersistenceProvider());
}
#Override
public void clearCachedProviders() {
//Not quite sure what this should do...
}
});
}
#Bean
public JtaTransactionManager transactionManager(){
return new WebLogicJtaTransactionManager();
}
public DataSource dataSource(){
final JndiDataSourceLookup dsLookup = new JndiDataSourceLookup();
dsLookup.setResourceRef(true);
DataSource dataSource = dsLookup.getDataSource("Jobs");
return dataSource;
}
}
The repository (we're returning a set with only one job as we had some other issues when returning a single object):
public interface JobRepository extends CrudRepository<Job, Integer> {
#Query(value = "SELECT * FROM JOB WHERE JOB.ID IN "
+ "(SELECT ID FROM "
+ "(SELECT * FROM JOB WHERE "
+ "JOB.STATUS = :status1 OR "
+ "JOB.STATUS = :status2 "
+ "ORDER BY JOB.PRIORITY ASC, JOB.CREATED ASC) "
+ "WHERE ROWNUM <= 1) "
+ "FOR UPDATE", nativeQuery = true)
public Set<Job> getNextJob(#Param("status1") String status1, #Param("status2") String status2);
The transaction handling class:
#Service
public class JobManagerTransactionHelper extends AbstractTransactionHelper{
#Transactional
#Override
public QdbJob getNextJobToProcess(){
Set<Job> jobs = null;
try {
jobs = jobRepo.getNextJob(Status.DONE.name(), Status.FAILED.name());
} catch (Exception ex) {
logger.error(ex);
}
return extractSingleJobFromSet(jobs);
}
Update 2: Some more code.
AbstractTransactionHelper:
#Service
public abstract class AbstractTransactionHelper {
#Autowired
QdbJobRepository jobRepo;
#Autowired
ArchivedJobRepository archive;
protected Job extractSingleJobFromSet(Set<Job> jobs){
Job job = null;
if(jobs != null && !jobs.isEmpty()){
for(job job : jobs){
if(this instanceof JobManagerTransactionHelper){
updateJob(job);
}
job = job;
}
}
return job;
}
protected void updateJob(Job job){
updateJob(job, Status.PROCESSING, null);
}
protected void updateJob(Job job, Status status, String serviceMessage){
if(job != null){
if(status != null){
job.setStatus(status);
}
if(serviceMessage != null){
job.setServiceMessage(serviceMessage);
}
saveJob(job);
}
}
protected void saveJob(Job job){
jobRepo.save(job);
archive.save(Job.convertJobToArchivedJob(job));
}
Update 4: Threading. newJob() is implemented by each service that uses the library.
#Service
public class JobManager{
#Autowired
private JobManagerTransactionHelper transactionHelper;
#Autowired
JobListener jobListener;
#Autowired
Config config;
protected final AtomicInteger atomicThreadCounter = new AtomicInteger(0);
protected boolean keepPolling;
protected Future<?> futurePoller;
protected ScheduledExecutorService pollService;
protected ThreadPoolExecutor threadPool;
public boolean start(){
if(!keepPolling){
ThreadFactory pollServiceThreadFactory = new ThreadFactoryBuilder()
.setNamePrefix(config.getService() + "ScheduledPollingPool-Thread").build();
ThreadFactory threadPoolThreadFactory = new ThreadFactoryBuilder()
.setNamePrefix(config.getService() + "ThreadPool-Thread").build();
keepPolling = true;
pollService = Executors.newSingleThreadScheduledExecutor(pollServiceThreadFactory);
threadPool = (ThreadPoolExecutor)Executors.newFixedThreadPool(getConfig().getThreadPoolSize(), threadPoolThreadFactory);
futurePoller = pollService.scheduleWithFixedDelay(getPollTask(), 0, getConfig().getPollingFrequency(), TimeUnit.MILLISECONDS);
return true;
}else{
return false;
}
}
protected Runnable getPollTask() {
return new Runnable(){
public void run(){
try{
while(atomicThreadCounter.get() < threadPool.getMaximumPoolSize() &&
threadPool.getActiveCount() < threadPool.getMaximumPoolSize() &&
keepPolling == true){
Job job = transactionHelper.getNextJobToProcess();
if(job != null){
threadPool.submit(getJobHandler(job));
atomicThreadCounter.incrementAndGet();//threadPool.getActiveCount() isn't updated fast enough the first loop
}else{
break;
}
}
}catch(Exception e){
logger.error(e);
}
}
};
}
protected Runnable getJobHandler(final Job job){
return new Runnable(){
public void run(){
try{
atomicThreadCounter.decrementAndGet();
jobListener.newJob(job);
}catch(Exception e){
logger.error(e);
}
}
};
}

As it turns out, the problem was the WeblogicJtaTransactionManager. My guess is that the FOR UPDATE resulted in a JPA transaction, but upon updating the object in the database, the WeblogicJtaTransactionManager was used, which failed to find an ongoing JTA transaction. Since we're deploying on Weblogic we wrongly assumed we had to use the WeblogicJtaTransactionManager.
Either way, exchanging the TransactionManager to a JpaTransactionManager (and explicitly setting the EntityManagerFactory and DataSource on it) basically solved all problems.
#Bean
public PlatformTransactionManager transactionManager() {
JpaTransactionManager jpaTransactionManager = new JpaTransactionManager(entityManagerFactory().getObject());
jpaTransactionManager.setDataSource(dataSource());
jpaTransactionManager.setJpaDialect(new HibernateJpaDialect());
return jpaTransactionManager;
}
Assuming you also have added an EntityManagerFactoryBean which is needed if you want to use multiple datasources in the same project (which we're doing, but not within single transactions, so no need for JTA).
#Bean
public LocalContainerEntityManagerFactoryBean entityManagerFactory() {
HibernateJpaVendorAdapter vendorAdapter = new HibernateJpaVendorAdapter();
LocalContainerEntityManagerFactoryBean factoryBean = new LocalContainerEntityManagerFactoryBean();
factoryBean.setDataSource(dataSource());
factoryBean.setJpaVendorAdapter(vendorAdapter);
factoryBean.setPackagesToScan("my.model");
return factoryBean;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spring batch : Job instances run sequentially when using annotaitons - java

Related

Control is not going inside #KafkaListner method and no message is printing in java console

Why does my spring batch takes step2 while step1 is still going on?

all itemreader from each job are initliazed at startup

Spring Kafka ChainedKafkaTransactionManager doesn't synchronize with JPA Spring-data transaction

SELECT FOR UPDATE query with Spring's #Transaction management creates deadlock upon subsequent updates

Categories

Resources