Skip to main content

Spring Batch tutorial with example




Spring batch is used to create and process the batch jobs. It provides various features like logging, job statistics, transaction management, restarting jobs. It is very helpful in processing of large dataset but with finite volume of data.
In this tutorial we will learn how to create and execute the spring batch job. In our example we will create a job which will import all the words from a text file to database and then at last it will print the total number of words available in the database.
Below is the project structure.

spring batch project structure

Creating batch job

Sample text file to import

Below is the contents of text file which we use for importing the words.
The list below gives you the 1000 most frequently used English words in alphabetical order.
Once you've mastered the shorter vocabulary lists, this is the next step.
It would take time to learn the entire list from scratch, but you are probably already familiar with some of these words.
Feel free to copy this list into your online flashcard management tool, an app, or print it out to make paper flashcards.
You will have to look up the definitions on your own either in English or in your own language. Good luck improving your English vocabulary!

a
ability
able
about
above
accept
according
account

Maven dependency

We need to add below dependencies for spring-batch, h2 database and spring data JPA.
      <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-batch</artifactId>
      </dependency>

      <dependency>
        <groupId>com.h2database</groupId>
        <artifactId>h2</artifactId>
        <scope>runtime</scope>
      </dependency>

      <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-jpa</artifactId>
      </dependency>

application.properties configuration

Below properties are added to configure the database properties. Generate ddl property is used to create the tables automatically as per defined entity beans.
spring.datasource.url=jdbc:h2:mem:app-data
spring.datasource.jdbcUrl=jdbc:h2:mem:app-data
spring.datasource.driverClassName=org.h2.Driver
spring.datasource.username=sa
spring.datasource.password=

spring.jpa.database-platform=org.hibernate.dialect.H2Dialect
spring.jpa.generate-ddl=true
Below property to enable the H2 database console, so we can check the tables and other objects like any SQL editor.
spring.h2.console.enabled=true
Below property need to put if you don't want to run your batch job automatically on every start of your application otherwise by default it will run all the defined job on each time application starts.
spring.batch.job.enabled=false

Data source configuration

In this example we will use same database for both application and batch job. You can check my another post on how to use multiple data source with Spring boot and batch application.


Data source and repository bean configuration

@Configuration
@EnableJpaRepositories(
        entityManagerFactoryRef = "appEntityManagerFactory",
        basePackages = "com.ttj.app.repository"
)
@EnableTransactionManagement
public class AppDataSourceConfig {

    @Bean
    @ConfigurationProperties(prefix = "spring.datasource")
    public DataSource appDataSource(){
        return DataSourceBuilder.create().build();
    }

    @Bean(name = "appEntityManagerFactory")
    public LocalContainerEntityManagerFactoryBean appEntityManagerFactory(EntityManagerFactoryBuilder builder,
            @Qualifier("appDataSource") DataSource appDataSource){

        return builder
                .dataSource(appDataSource)
                .packages("com.ttj.app.domain")
                .persistenceUnit("app")
                .build();
    }
}

Repository class

package com.ttj.app.repository;

import com.ttj.app.domain.Word;
import org.springframework.data.repository.CrudRepository;

public interface WordRepository  extends CrudRepository {}

Domain object (Entity)

package com.ttj.app.domain;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.EnumType;
import javax.persistence.Enumerated;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.Table;

@Entity
@Table(name="WORDS")
public class Word {
    
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @Column
    private String text;
    
    @Enumerated(EnumType.STRING)
    private Language language;
    
    //getter methods
    //setter methods

}
Below is the enum class used by above entity class.
public enum Language {
 EN, HI;
}

Batch job configuration

Batch job is a collection of steps to execute them in specified order. Any job contains some steps which executed collectively, for example in our case we can list below steps for our job.
  1. Import Words
    1. Read a text file line by line.
    2. Extract the words from each line.
    3. Write the words in bunches to database. 
  2. Finally print the total number of words available in database.

Steps can be creates two ways, one is using Tasklet and another is using a chain of reader/processor & writer. We will create the first 3 steps using the reader/writer and processor and for last step we will use Tasklet.

Step1 - Import words

In this step we want to perform a chain of tasks, like read the file then extract the words and then write them to database. So for this step we will use reader/processor and writer implementation to create the step.
Below bean defines the reader where it reads text file line by line from class-path. In mapper we can define it to create some other object also from each line. But here we are reading it as a string only.
    @Bean
    public FlatFileItemReader<String> reader() {
        return new FlatFileItemReaderBuilder<String>()
                .name("fileReader")
                .resource(new ClassPathResource("words.txt"))
                .lineMapper(new LineMapper<String>() {
                    @Override
                    public String mapLine(String s, int i) throws Exception {
                        return s;
                    }
                })
                .build();
    }
In below processor definition we are transforming the single line to list of Word class.
    @Bean
    public ItemProcessor<String, List<Word>> processor() {
        return new ItemProcessor<String, List<Word>>(){
            @Override
            public List<Word> process(String s) throws Exception {
                if(s!=null && s.length()>0){
                    String[] arr = s.split("[\\s,=\\.*]");
                    if(arr!=null && arr.length>0){
                        List<Word> list = new ArrayList<>();
                        for (int i=0;i<arr.length;i++){
                            if(arr[i]!=null && arr[i].length()>0)
                                list.add(new Word(arr[i], Language.EN));
                        }
                        return list;
                    }
                }
                return null;
            }
        };
    }
Now in our writer, it provides the list of items returned by processor definition. Since in our processor we are transforming each line as list of word, so in writer we are getting the list of list of words to process them in chunks. Chunks size we define at job configuration which we will see latter.
    @Bean
    public ItemWriter<List<Word>> writer(@Qualifier("appEntityManagerFactory") EntityManagerFactory appEntityManagerFactory) {
        ItemWriter<List<Word>> writer = new ItemWriter<List<Word>>(){
            @Override
            @Transactional
            public void write(List<? extends List<Word>> items) {
                items.forEach(item->{
                        wordRepository.saveAll(item);
                });
            }
        };
        return writer;
    }

Step2- Print total count of words

In this step we need to print the count of total words in database after import, so we will create a Tasklet bean as given below.
    @Bean
    public Step totalCountStep(){
        return stepBuilderFactory.get("totalCountStep")
                .tasklet(new Tasklet() {

                    @Override
                    public RepeatStatus execute(StepContribution contribution,
                                                ChunkContext chunkContext) throws Exception {

                        System.out.println("Total word count: "+wordRepository.count());
                        return RepeatStatus.FINISHED;
                    }
                }).build();
    }

Create the job configuration using above steps

Now we will define the bean for our Job using the above steps.
    @Bean
    public Job importWordsJob(Step importStep, Step totalCountStep) {
        return jobBuilderFactory.get("importWordsJob")
                .incrementer(new RunIdIncrementer())
                .flow(importStep)
                .next(totalCountStep)
                .end()
                .build();
    }

Autowiring required dependencies

Below are our dependencies for Job builder factory, Step builder factory and repository class which are required to define above beans. We don't need to define these beans as spring already handles them for us with auto-configuration enabled.
    @Autowired
    private JobBuilderFactory jobBuilderFactory;

    @Autowired
    private StepBuilderFactory stepBuilderFactory;

    @Autowired
    private WordRepository wordRepository;

Spring boot main class annotations

Below is our main class with required annotations where we have used annotation EnableBatchProcessing so it can configure the required beans like builder factories for batch.
@EnableBatchProcessing
@SpringBootApplication
@ComponentScan("com.ttj")
public class BatchTutorialApplication {

 public static void main(String[] args) {
  SpringApplication.run(BatchTutorialApplication.class, args);
 }
}

Executing batch job

There are multiple ways to run the batch job, like enabling the job execution on application startup and registering with spring cloud data flow server. See the below links on how to setup data flow server and execute the batch job using spring cloud data flow server.


Another way to execute using the Job launcher which we will see in this example. We will create a REST service endpoint which will invoke the batch job and this service URL can be called using any browser or HTTP client.

REST Service class to execute the batch job using web URL

Below is the code of our REST service which have job launcher and job bean autowired to execute the job using launcher. Here we are passing a job parameter with date string which is only used to execute the job with unique parameter every time otherwise this job will execute only once till the application is running.
@RestController
@RequestMapping("/jobs")
public class JobController {

    @Autowired
    JobLauncher jobLauncher;

    @Autowired
    private Job importWordsJob;

    @GetMapping("/importWords")
    public void runJob(){
        try {
            JobParametersBuilder builder = new JobParametersBuilder();
            builder.addString("startDate", LocalDateTime.now().toString());

            jobLauncher.run(importWordsJob, builder.toJobParameters());
        }catch(Exception e){
            e.printStackTrace();
        }
    }
}
Now our service is ready to run and execute the batch job. Execute below command in project root directory to run the application.
clean spring-boot:run
Now hit the service URL http://localhost:8080/jobs/importWords in your web browser.
You will see below result in the application log or console.
2019-12-21 16:38:58.542  INFO 7156 --- [nio-8080-exec-1] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=importWordsJob]] launched with the following parameters: [{startDate=2019-12-21T16:38:58.495}]
2019-12-21 16:38:58.576  WARN 7156 --- [nio-8080-exec-1] o.s.c.t.b.l.TaskBatchExecutionListener   : This job was executed outside the scope of a task but still used the task listener.
2019-12-21 16:38:58.587  INFO 7156 --- [nio-8080-exec-1] o.s.batch.core.job.SimpleStepHandler     : Executing step: [importStep]
2019-12-21 16:38:58.857  INFO 7156 --- [nio-8080-exec-1] o.s.batch.core.step.AbstractStep         : Step: [importStep] executed in 270ms
2019-12-21 16:38:58.875  INFO 7156 --- [nio-8080-exec-1] o.s.batch.core.job.SimpleStepHandler     : Executing step: [totalCountStep]
Total word count: 104
2019-12-21 16:38:59.011  INFO 7156 --- [nio-8080-exec-1] o.s.batch.core.step.AbstractStep         : Step: [totalCountStep] executed in 135ms
2019-12-21 16:38:59.018  INFO 7156 --- [nio-8080-exec-1] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=importWordsJob]] completed with the following parameters: [{startDate=2019-12-21T16:38:58.495}] and the following status: [COMPLETED] in 449ms
Now we we will execute this job one more time using the same service URL and you will see below lines of logs added.
2019-12-21 16:47:17.261  INFO 7156 --- [nio-8080-exec-4] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=importWordsJob]] launched with the following parameters: [{startDate=2019-12-21T16:47:17.252}]
2019-12-21 16:47:17.264  WARN 7156 --- [nio-8080-exec-4] o.s.c.t.b.l.TaskBatchExecutionListener   : This job was executed outside the scope of a task but still used the task listener.
2019-12-21 16:47:17.271  INFO 7156 --- [nio-8080-exec-4] o.s.batch.core.job.SimpleStepHandler     : Executing step: [importStep]
2019-12-21 16:47:17.322  INFO 7156 --- [nio-8080-exec-4] o.s.batch.core.step.AbstractStep         : Step: [importStep] executed in 51ms
2019-12-21 16:47:17.327  INFO 7156 --- [nio-8080-exec-4] o.s.batch.core.job.SimpleStepHandler     : Executing step: [totalCountStep]
Total word count: 208
2019-12-21 16:47:17.336  INFO 7156 --- [nio-8080-exec-4] o.s.batch.core.step.AbstractStep         : Step: [totalCountStep] executed in 9ms
2019-12-21 16:47:17.338  INFO 7156 --- [nio-8080-exec-4] o.s.b.c.l.support.SimpleJobLauncher      : Job: [FlowJob: [name=importWordsJob]] completed with the following parameters: [{startDate=2019-12-21T16:47:17.252}] and the following status: [COMPLETED] in 75ms

Git source code

You can find the complete source code at below GIT location. This source code also includes the code for multiple data source configuration with cloud task configuration.
https://github.com/thetechnojournals/spring-tutorials/tree/master/SpringBatchTutorial

Comments

  1. Thanks for sharing this spring batch tutorial with example. I will really help me a lot. SQL Server Load Soap API

    ReplyDelete
  2. If you're seeking CCNA Training in Noida, your search ends at APTRON NOIDA. With its extensive experience in providing high-quality IT training, APTRON NOIDA stands as a premier institute in the city. With a proven track record of transforming aspiring networking professionals into competent CCNA-certified experts, APTRON NOIDA offers a comprehensive training program that combines theoretical knowledge with practical hands-on experience.

    ReplyDelete
  3. Your blog post provides a concise introduction to Spring Batch and its key features, which include logging, job statistics, transaction management, and job restarting. It also highlights the usefulness of Spring Batch in efficiently processing large datasets, making it a valuable tool for handling batch jobs.
    Software Testing Trends To Look Out For In 2023

    ReplyDelete

Post a Comment

Popular Posts

Setting up kerberos in Mac OS X

Kerberos in MAC OS X Kerberos authentication allows the computers in same domain network to authenticate certain services with prompting the user for credentials. MAC OS X comes with Heimdal Kerberos which is an alternate implementation of the kerberos and uses LDAP as identity management database. Here we are going to learn how to setup a kerberos on MAC OS X which we will configure latter in our application. Installing Kerberos In MAC we can use Homebrew for installing any software package. Homebrew makes it very easy to install the kerberos by just executing a simple command as given below. brew install krb5 Once installation is complete, we need to set the below export commands in user's profile which will make the kerberos utility commands and compiler available to execute from anywhere. Open user's bash profile: vi ~/.bash_profile Add below lines: export PATH=/usr/local/opt/krb5/bin:$PATH export PATH=/usr/local/opt/krb5/sbin:$PATH export LDFLAGS=&

Why HashMap key should be immutable in java

HashMap is used to store the data in key, value pair where key is unique and value can be store or retrieve using the key. Any class can be a candidate for the map key if it follows below rules. 1. Overrides hashcode() and equals() method.   Map stores the data using hashcode() and equals() method from key. To store a value against a given key, map first calls key's hashcode() and then uses it to calculate the index position in backed array by applying some hashing function. For each index position it has a bucket which is a LinkedList and changed to Node from java 8. Then it will iterate through all the element and will check the equality with key by calling it's equals() method if a match is found, it will update the value with the new value otherwise it will add the new entry with given key and value. In the same way it check for the existing key when get() is called. If it finds a match for given key in the bucket with given hashcode(), it will return the value other

Entity to DTO conversion in Java using Jackson

It's very common to have the DTO class for a given entity in any application. When persisting data, we use entity objects and when we need to provide the data to end user/application we use DTO class. Due to this we may need to have similar properties on DTO class as we have in our Entity class and to share the data we populate DTO objects using entity objects. To do this we may need to call getter on entity and then setter on DTO for the same data which increases number of code line. Also if number of DTOs are high then we need to write lot of code to just get and set the values or vice-versa. To overcome this problem we are going to use Jackson API and will see how to do it with minimal code only. Maven dependency <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.9.9</version> </dependency> Entity class Below is

Multiple data source with Spring boot, batch and cloud task

Here we will see how we can configure different datasource for application and batch. By default, Spring batch stores the job details and execution details in database. If separate data source is not configured for spring batch then it will use the available data source in your application if configured and create batch related tables there. Which may be the unwanted burden on application database and we would like to configure separate database for spring batch. To overcome this situation we will configure the different datasource for spring batch using in-memory database, since we don't want to store batch job details permanently. Other thing is the configuration of  spring cloud task in case of multiple datasource and it must point to the same data source which is pointed by spring batch. In below sections, we will se how to configure application, batch and cloud task related data sources. Application Data Source Define the data source in application properties or yml con