Spring batch is used to create and process the batch jobs. It provides various features like logging, job statistics, transaction management, restarting jobs. It is very helpful in processing of large dataset but with finite volume of data.
In this tutorial we will learn how to create and execute the spring batch job. In our example we will create a job which will import all the words from a text file to database and then at last it will print the total number of words available in the database.
Below is the project structure.
Creating batch job
Sample text file to import
Below is the contents of text file which we use for importing the words.The list below gives you the 1000 most frequently used English words in alphabetical order.
Once you've mastered the shorter vocabulary lists, this is the next step.
It would take time to learn the entire list from scratch, but you are probably already familiar with some of these words.
Feel free to copy this list into your online flashcard management tool, an app, or print it out to make paper flashcards.
You will have to look up the definitions on your own either in English or in your own language. Good luck improving your English vocabulary!
a
ability
able
about
above
accept
according
account
Maven dependency
We need to add below dependencies for spring-batch, h2 database and spring data JPA. <dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
application.properties configuration
Below properties are added to configure the database properties. Generate ddl property is used to create the tables automatically as per defined entity beans.spring.datasource.url=jdbc:h2:mem:app-data
spring.datasource.jdbcUrl=jdbc:h2:mem:app-data
spring.datasource.driverClassName=org.h2.Driver
spring.datasource.username=sa
spring.datasource.password=
spring.jpa.database-platform=org.hibernate.dialect.H2Dialect
spring.jpa.generate-ddl=true
Below property to enable the H2 database console, so we can check the tables and other objects like any SQL editor.
spring.h2.console.enabled=true
Below property need to put if you don't want to run your batch job automatically on every start of your application otherwise by default it will run all the defined job on each time application starts.
spring.batch.job.enabled=false
Data source configuration
In this example we will use same database for both application and batch job. You can check my another post on how to use multiple data source with Spring boot and batch application.Multiple data source with Spring boot, batch and cloud task
Data source and repository bean configuration
@Configuration
@EnableJpaRepositories(
entityManagerFactoryRef = "appEntityManagerFactory",
basePackages = "com.ttj.app.repository"
)
@EnableTransactionManagement
public class AppDataSourceConfig {
@Bean
@ConfigurationProperties(prefix = "spring.datasource")
public DataSource appDataSource(){
return DataSourceBuilder.create().build();
}
@Bean(name = "appEntityManagerFactory")
public LocalContainerEntityManagerFactoryBean appEntityManagerFactory(EntityManagerFactoryBuilder builder,
@Qualifier("appDataSource") DataSource appDataSource){
return builder
.dataSource(appDataSource)
.packages("com.ttj.app.domain")
.persistenceUnit("app")
.build();
}
}
Repository class
package com.ttj.app.repository;
import com.ttj.app.domain.Word;
import org.springframework.data.repository.CrudRepository;
public interface WordRepository extends CrudRepository {}
Domain object (Entity)
package com.ttj.app.domain;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.EnumType;
import javax.persistence.Enumerated;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.Table;
@Entity
@Table(name="WORDS")
public class Word {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column
private String text;
@Enumerated(EnumType.STRING)
private Language language;
//getter methods
//setter methods
}
Below is the enum class used by above entity class.
public enum Language {
EN, HI;
}
Batch job configuration
Batch job is a collection of steps to execute them in specified order. Any job contains some steps which executed collectively, for example in our case we can list below steps for our job.- Import Words
- Read a text file line by line.
- Extract the words from each line.
- Write the words in bunches to database.
- Finally print the total number of words available in database.
Steps can be creates two ways, one is using Tasklet and another is using a chain of reader/processor & writer. We will create the first 3 steps using the reader/writer and processor and for last step we will use Tasklet.
Step1 - Import words
In this step we want to perform a chain of tasks, like read the file then extract the words and then write them to database. So for this step we will use reader/processor and writer implementation to create the step.Below bean defines the reader where it reads text file line by line from class-path. In mapper we can define it to create some other object also from each line. But here we are reading it as a string only.
@Bean
public FlatFileItemReader<String> reader() {
return new FlatFileItemReaderBuilder<String>()
.name("fileReader")
.resource(new ClassPathResource("words.txt"))
.lineMapper(new LineMapper<String>() {
@Override
public String mapLine(String s, int i) throws Exception {
return s;
}
})
.build();
}
In below processor definition we are transforming the single line to list of Word class. @Bean
public ItemProcessor<String, List<Word>> processor() {
return new ItemProcessor<String, List<Word>>(){
@Override
public List<Word> process(String s) throws Exception {
if(s!=null && s.length()>0){
String[] arr = s.split("[\\s,=\\.*]");
if(arr!=null && arr.length>0){
List<Word> list = new ArrayList<>();
for (int i=0;i<arr.length;i++){
if(arr[i]!=null && arr[i].length()>0)
list.add(new Word(arr[i], Language.EN));
}
return list;
}
}
return null;
}
};
}
Now in our writer, it provides the list of items returned by processor definition. Since in our processor we are transforming each line as list of word, so in writer we are getting the list of list of words to process them in chunks. Chunks size we define at job configuration which we will see latter. @Bean
public ItemWriter<List<Word>> writer(@Qualifier("appEntityManagerFactory") EntityManagerFactory appEntityManagerFactory) {
ItemWriter<List<Word>> writer = new ItemWriter<List<Word>>(){
@Override
@Transactional
public void write(List<? extends List<Word>> items) {
items.forEach(item->{
wordRepository.saveAll(item);
});
}
};
return writer;
}
Step2- Print total count of words
In this step we need to print the count of total words in database after import, so we will create a Tasklet bean as given below. @Bean
public Step totalCountStep(){
return stepBuilderFactory.get("totalCountStep")
.tasklet(new Tasklet() {
@Override
public RepeatStatus execute(StepContribution contribution,
ChunkContext chunkContext) throws Exception {
System.out.println("Total word count: "+wordRepository.count());
return RepeatStatus.FINISHED;
}
}).build();
}
Create the job configuration using above steps
Now we will define the bean for our Job using the above steps. @Bean
public Job importWordsJob(Step importStep, Step totalCountStep) {
return jobBuilderFactory.get("importWordsJob")
.incrementer(new RunIdIncrementer())
.flow(importStep)
.next(totalCountStep)
.end()
.build();
}
Autowiring required dependencies
Below are our dependencies for Job builder factory, Step builder factory and repository class which are required to define above beans. We don't need to define these beans as spring already handles them for us with auto-configuration enabled. @Autowired
private JobBuilderFactory jobBuilderFactory;
@Autowired
private StepBuilderFactory stepBuilderFactory;
@Autowired
private WordRepository wordRepository;
Spring boot main class annotations
Below is our main class with required annotations where we have used annotation EnableBatchProcessing so it can configure the required beans like builder factories for batch.@EnableBatchProcessing
@SpringBootApplication
@ComponentScan("com.ttj")
public class BatchTutorialApplication {
public static void main(String[] args) {
SpringApplication.run(BatchTutorialApplication.class, args);
}
}
Executing batch job
There are multiple ways to run the batch job, like enabling the job execution on application startup and registering with spring cloud data flow server. See the below links on how to setup data flow server and execute the batch job using spring cloud data flow server.Setup Spring Cloud Data Flow Server
Spring batch job execution with Spring cloud data flow server
Another way to execute using the Job launcher which we will see in this example. We will create a REST service endpoint which will invoke the batch job and this service URL can be called using any browser or HTTP client.
REST Service class to execute the batch job using web URL
Below is the code of our REST service which have job launcher and job bean autowired to execute the job using launcher. Here we are passing a job parameter with date string which is only used to execute the job with unique parameter every time otherwise this job will execute only once till the application is running.@RestController
@RequestMapping("/jobs")
public class JobController {
@Autowired
JobLauncher jobLauncher;
@Autowired
private Job importWordsJob;
@GetMapping("/importWords")
public void runJob(){
try {
JobParametersBuilder builder = new JobParametersBuilder();
builder.addString("startDate", LocalDateTime.now().toString());
jobLauncher.run(importWordsJob, builder.toJobParameters());
}catch(Exception e){
e.printStackTrace();
}
}
}
Now our service is ready to run and execute the batch job. Execute below command in project root directory to run the application.clean spring-boot:run
Now hit the service URL http://localhost:8080/jobs/importWords in your web browser.
You will see below result in the application log or console.
2019-12-21 16:38:58.542 INFO 7156 --- [nio-8080-exec-1] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=importWordsJob]] launched with the following parameters: [{startDate=2019-12-21T16:38:58.495}]
2019-12-21 16:38:58.576 WARN 7156 --- [nio-8080-exec-1] o.s.c.t.b.l.TaskBatchExecutionListener : This job was executed outside the scope of a task but still used the task listener.
2019-12-21 16:38:58.587 INFO 7156 --- [nio-8080-exec-1] o.s.batch.core.job.SimpleStepHandler : Executing step: [importStep]
2019-12-21 16:38:58.857 INFO 7156 --- [nio-8080-exec-1] o.s.batch.core.step.AbstractStep : Step: [importStep] executed in 270ms
2019-12-21 16:38:58.875 INFO 7156 --- [nio-8080-exec-1] o.s.batch.core.job.SimpleStepHandler : Executing step: [totalCountStep]
Total word count: 104
2019-12-21 16:38:59.011 INFO 7156 --- [nio-8080-exec-1] o.s.batch.core.step.AbstractStep : Step: [totalCountStep] executed in 135ms
2019-12-21 16:38:59.018 INFO 7156 --- [nio-8080-exec-1] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=importWordsJob]] completed with the following parameters: [{startDate=2019-12-21T16:38:58.495}] and the following status: [COMPLETED] in 449ms
Now we we will execute this job one more time using the same service URL and you will see below lines of logs added.2019-12-21 16:47:17.261 INFO 7156 --- [nio-8080-exec-4] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=importWordsJob]] launched with the following parameters: [{startDate=2019-12-21T16:47:17.252}]
2019-12-21 16:47:17.264 WARN 7156 --- [nio-8080-exec-4] o.s.c.t.b.l.TaskBatchExecutionListener : This job was executed outside the scope of a task but still used the task listener.
2019-12-21 16:47:17.271 INFO 7156 --- [nio-8080-exec-4] o.s.batch.core.job.SimpleStepHandler : Executing step: [importStep]
2019-12-21 16:47:17.322 INFO 7156 --- [nio-8080-exec-4] o.s.batch.core.step.AbstractStep : Step: [importStep] executed in 51ms
2019-12-21 16:47:17.327 INFO 7156 --- [nio-8080-exec-4] o.s.batch.core.job.SimpleStepHandler : Executing step: [totalCountStep]
Total word count: 208
2019-12-21 16:47:17.336 INFO 7156 --- [nio-8080-exec-4] o.s.batch.core.step.AbstractStep : Step: [totalCountStep] executed in 9ms
2019-12-21 16:47:17.338 INFO 7156 --- [nio-8080-exec-4] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=importWordsJob]] completed with the following parameters: [{startDate=2019-12-21T16:47:17.252}] and the following status: [COMPLETED] in 75ms
Git source code
You can find the complete source code at below GIT location. This source code also includes the code for multiple data source configuration with cloud task configuration.https://github.com/thetechnojournals/spring-tutorials/tree/master/SpringBatchTutorial
Thanks for sharing this spring batch tutorial with example. I will really help me a lot. SQL Server Load Soap API
ReplyDeleteIf you're seeking CCNA Training in Noida, your search ends at APTRON NOIDA. With its extensive experience in providing high-quality IT training, APTRON NOIDA stands as a premier institute in the city. With a proven track record of transforming aspiring networking professionals into competent CCNA-certified experts, APTRON NOIDA offers a comprehensive training program that combines theoretical knowledge with practical hands-on experience.
ReplyDeleteYour blog post provides a concise introduction to Spring Batch and its key features, which include logging, job statistics, transaction management, and job restarting. It also highlights the usefulness of Spring Batch in efficiently processing large datasets, making it a valuable tool for handling batch jobs.
ReplyDeleteSoftware Testing Trends To Look Out For In 2023