Skip to main content

Spring batch job with Spring cloud data flow server

Spring cloud data flow server is used to execute any batch job, command line task or streams as a microservice. Here we will learn how to register and execute a batch job with spring cloud data flow server. These jobs will be executed in separate JVM which gets created and destroyed on demand by spring cloud data flow server.

How to setup spring cloud data flow server

Please refer below post on how to setup cloud data flow server.
https://www.thetechnojournals.com/2019/12/setting-up-spring-cloud-data-flow-server.html

Spring batch job

Spring batch job is used for the batch processing or background job execution where we want to process the limited set of data. I will not show how to create batch jobs in this tutorial but we will learn what is required to register your batch job with spring cloud data flow server and how to execute it then.

Spring batch job registration with cloud data flow server

To register a spring batch job we need to enable the task. Below are the steps required.

Enable task in your batch application

  • Add below maven dependency for the task.
  •     <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-task</artifactId>
        </dependency>
    
  • Put below annotation on your Spring boot main class.
  • @EnableTask
  • Batch data source configuration
  • As both Spring batch job and Spring cloud data flow server uses some database to keep the data pertaining to the batch job or cloud tasks along with their execution state and context data. You will see the below error while executing the tasks if both of them not pointing to same database.
    Error starting ApplicationContext. To display the conditions report re-run your application with 'debug' enabled.
    2019-12-08 15:13:57.058 ERROR 12396 --- [           main] o.s.boot.SpringApplication               : Application run failed
    
    org.springframework.context.ApplicationContextException: Failed to start bean 'taskLifecycleListener'; nested exception is java.lang.IllegalArgumentException: Invalid TaskExecution, ID 1 not found
    
    Spring cloud data flow server need to keep the reference of batch jobs tables and if they use different database then it may not be able to resolve the job IDs, hence we need to make sure that both the Batch job application and Cloud data flow server are pointing to same database. 

Registering batch application

We can register the application two ways, one is using the Spring cloud data flow server UI dashboard and another is using Shell.
  • Using Spring cloud data flow server UI dashboard
    Open the link http://localhost:9393/dashboard/#/apps in your browser and click on "Add Application(s)" link.
app registration
    In below screen select the highlighted option.
app registration
    Fill the required details in below screen as given. We have to provide the jar location of our batch application.
app registration
    Click on "Register Application" link in below screen to complete the registration.
app registration
    Once registration is complete, you can see the job under "App" link as given below.
app registration
    Now we need to create a task for same job. Click on "Tasks" link in left navigation and then click the "Create Task(s)" link.
app registration
    Then in below screen drag & drop the job from left side pane to right side pane. Then connect it with start and end point. Then click on the "Create Task" button.
task creation
    Give a name to your task and click on the create button.
    Once you clicked the create task, it will take you to below screen where you can see the tasks.
  • Using Shell
    Please follow below steps.
    Download the Jar file of shell from below location.
    https://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-shell/2.2.1.RELEASE/spring-cloud-dataflow-shell-2.2.1.RELEASE.jar
    For latest version, please refer the link https://dataflow.spring.io/docs/installation/
    Once downloaded, execute below command in terminal or command prompt to run the shell application.
    java -jar spring-cloud-dataflow-shell-2.2.1.RELEASE.jar
    
    You will see below output in terminal and shell console to execute the shell commands.
data flow shell
    Now execute below command to register your Batch application Jar with data flow.
    app register --name my-batch-job --type task --uri file:////jobs/batch-tutorial-0.0.1-SNAPSHOT.jar
    
    Then execute below command to create a task of registered batch job.
    task create my-batch-job-task --definition my-batch-job
    
    Now check the data flow server console. You will see the app and task registered with data flow server.

Executing registered task (batch job) using cloud data flow server

Here we have registered our batch job as task which we will execute. It can be done below two ways.
  • Using Spring cloud data flow server UI dashboard
    Open cloud data flow task UI dashboard and click on the task you want to run under "Tasks" link.
    You will see the below screen. Click on the "Launch" button.
launch task
    In this screen you can give the parameter, required to run your job and then select the "Launch the task" button.
launch task
  • Using shell
    Run the below command in terminal or command prompt to open the shell terminal.
    java -jar spring-cloud-dataflow-shell-2.2.1.RELEASE.jar
    
    Then execute the below command to run the task.
    task launch my-batch-job-task
    

Once task is launched, we can check them at Tasks>Executions on data flow UI dashboard. Click on the ID link to check the status of your task execution.
launch task
Then, you will see below screen where you can check the status of your job. Scroll this screen to reach the bottom and there you will see the application execution log of your batch job as given here. In my case I have registered an application with two jobs which you can see in application logs screenshot executed successfully. In ideal scenario we have only one job to execute per application.
task status

Other posts you may like to explore:
Configure multiple datasource with Spring boot, batch and cloud task

Comments

Popular Posts

Setting up kerberos in Mac OS X

Kerberos in MAC OS X Kerberos authentication allows the computers in same domain network to authenticate certain services with prompting the user for credentials. MAC OS X comes with Heimdal Kerberos which is an alternate implementation of the kerberos and uses LDAP as identity management database. Here we are going to learn how to setup a kerberos on MAC OS X which we will configure latter in our application. Installing Kerberos In MAC we can use Homebrew for installing any software package. Homebrew makes it very easy to install the kerberos by just executing a simple command as given below. brew install krb5 Once installation is complete, we need to set the below export commands in user's profile which will make the kerberos utility commands and compiler available to execute from anywhere. Open user's bash profile: vi ~/.bash_profile Add below lines: export PATH=/usr/local/opt/krb5/bin:$PATH export PATH=/usr/local/opt/krb5/sbin:$PATH export LDFLAGS=&

Why HashMap key should be immutable in java

HashMap is used to store the data in key, value pair where key is unique and value can be store or retrieve using the key. Any class can be a candidate for the map key if it follows below rules. 1. Overrides hashcode() and equals() method.   Map stores the data using hashcode() and equals() method from key. To store a value against a given key, map first calls key's hashcode() and then uses it to calculate the index position in backed array by applying some hashing function. For each index position it has a bucket which is a LinkedList and changed to Node from java 8. Then it will iterate through all the element and will check the equality with key by calling it's equals() method if a match is found, it will update the value with the new value otherwise it will add the new entry with given key and value. In the same way it check for the existing key when get() is called. If it finds a match for given key in the bucket with given hashcode(), it will return the value other

Entity to DTO conversion in Java using Jackson

It's very common to have the DTO class for a given entity in any application. When persisting data, we use entity objects and when we need to provide the data to end user/application we use DTO class. Due to this we may need to have similar properties on DTO class as we have in our Entity class and to share the data we populate DTO objects using entity objects. To do this we may need to call getter on entity and then setter on DTO for the same data which increases number of code line. Also if number of DTOs are high then we need to write lot of code to just get and set the values or vice-versa. To overcome this problem we are going to use Jackson API and will see how to do it with minimal code only. Maven dependency <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.9.9</version> </dependency> Entity class Below is

Multiple data source with Spring boot, batch and cloud task

Here we will see how we can configure different datasource for application and batch. By default, Spring batch stores the job details and execution details in database. If separate data source is not configured for spring batch then it will use the available data source in your application if configured and create batch related tables there. Which may be the unwanted burden on application database and we would like to configure separate database for spring batch. To overcome this situation we will configure the different datasource for spring batch using in-memory database, since we don't want to store batch job details permanently. Other thing is the configuration of  spring cloud task in case of multiple datasource and it must point to the same data source which is pointed by spring batch. In below sections, we will se how to configure application, batch and cloud task related data sources. Application Data Source Define the data source in application properties or yml con