Spring Batch FlatFileItemReader: CSV Reader Example

Spring Batch provides a FlatFileItemReader that we can use to read data from flat files, including CSV files. Here’s an example of how to configure and use FlatFileItemReader to read data from a CSV file in a Spring Batch job.

1. CSV File and Model

For demo purposes, we will be using the following CSF files:

Lokesh,Gupta,41,true
Brian,Schultz,42,false
John,Cena,43,true
Albert,Pinto,44,false

Then we need to create a domain object to represent the data.

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

@Data
@NoArgsConstructor
@AllArgsConstructor
public class Person {

    String firstName;
    String lastName;
    Integer age;
    Boolean active;
}

2. Configuring FlatFileItemReader

The org.springframework.batch.item.file.FlatFileItemReader consists of two main components:

A Spring Resource that represents the file to be read
An implementation of the LineMapper interface (same as RowMapper in Spring JDBC). When reading a flat file, each line is presented to LineMapper as a String to parse.

The LineMapper internally consists of a LineTokenizer and FieldSetMapper. The LineTokenizer implementation parses the line into a FieldSet (similar to columns in a database row). The FieldSetMapper later maps the FieldSets to a domain object.

2.1. Delimited Files (CSV Files)

In delimited files, a character acts as a divider between each field in the record. In delimited files, we map the columns to the POJO fields after dividing each record with the delimiter. The default delimiter is always a comma.

A Step configuration that can read the delimited flat file can be built using the FlatFileItemReaderBuilder.

@Bean
@StepScope
public FlatFileItemReader<Person> personItemReader() {

  return new FlatFileItemReaderBuilder<Person>()
      .name("personItemReader")
      .delimited()
      .names("firstName", "lastName", "age", "active")
      .targetType(Person.class)
      .resource(csvFile)
      .build();
}

If we want to configure a different delimiter, we can define the custom DelimitedLineTokenizer bean.

@Bean
public DelimitedLineTokenizer tokenizer() {

  var tokenizer = new DelimitedLineTokenizer();
  tokenizer.setDelimiter("#");  // Specify a different delimiter. Default is comma.
  tokenizer.setNames("firstName", "lastName", "age", "active");
  return tokenizer;
}

2.2. Fixed-Width Files

When working on legacy mainframe systems, we may encounter fixed-width files due to the way COBOL and other such technologies declare their storage.

In the absence of a delimiter (or any other metadata), we have to rely on the length of each field in the file. Consider the following fixed-width file:

Lokesh    Gupta     41  true
Brian     Schultz   42  false
John      Cena      43  true
Albert    Pinto     44  false

In the above file, the lengths of the fields are:

firstName	10
lastName	10
age	4
active	5

The equivalent FlatFileItemReader can be used by using the methods .fixedLength() and columns() specifying the length of the fields.

@Bean
@StepScope
public FlatFileItemReader<Person> personItemReaderFixedWidth() {

  return new FlatFileItemReaderBuilder<Person>()
    .name("personItemReader")
    .fixedLength()
    .columns(new Range(1, 10), new Range(11, 20), new Range(21, 24), new Range(25, 30))
    .names("firstName", "lastName", "age", "active")
    .targetType(Person.class)
    .resource(csvFile)
    .build();
}

2.3. FieldSetMapper

By default, Spring batch uses BeanWrapperFieldSetMapper which is a FieldSetMapper implementation based on a fuzzy search of bean property paths. It makes a good guess to match the column names with the field names in the POJO class. For example, the BeanWrapperFieldSetMapper will call Person#setFirstName, Person#setLastName, and so on, based on the names of the columns configured in the LineTokenizer.

If there is quite a difference in the column manes and the PJO class field names or structure of fields, we can define our own implementation of FieldSetMapper.

public class PersonFieldSetMapper implements FieldSetMapper<Person> {

  public Person mapFieldSet(FieldSet fieldSet) {
  
    Person person = new Person();
    person.setFirstName(fieldSet.readString("firstName"));
    person.setLastName(fieldSet.readString("lastName"));
    ....
    return person;
  }
}

And then inject this PersonFieldSetMapper into FlatFileItemReaderBuilder as follows:

@Bean
@StepScope
public FlatFileItemReader<Person> personItemReader() {

  return new FlatFileItemReaderBuilder<Person>()
      .name("personItemReader")
      .delimited()
      .names("firstName", "lastName", "age", "active")
      .fieldSetMapper(new PersonFieldSetMapper())
      .resource(csvFile)
      .build();
}

3. Read CSV with FlatFileItemReader

In the following configuration, the FlatFileItemReader is configured to read a CSV file. The DelimitedLineTokenizer is used to specify the column names, and the BeanWrapperFieldSetMapper is used to map each line to a Person object.

We’ll need to customize the ItemProcessor and ItemWriter beans according to the business logic and data destination. This configuration writes data to the database.

Finally, create a Job that includes the Steps.

import com.howtodoinjava.demo.batch.jobs.csvToDb.model.Person;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.LineMapper;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.batch.item.file.transform.LineTokenizer;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.Resource;
import org.springframework.transaction.PlatformTransactionManager;

import javax.sql.DataSource;

@Configuration
public class CsvToDatabaseJob {

  public static final Logger logger = LoggerFactory.getLogger(CsvToDatabaseJob.class);

  private static final String INSERT_QUERY = """
      insert into person (first_name, last_name, age, is_active)
      values (:firstName,:lastName,:age,:active)""";

  private final JobRepository jobRepository;

  public CsvToDatabaseJob(JobRepository jobRepository) {
    this.jobRepository = jobRepository;
  }

  @Value("classpath:csv/inputData.csv")
  private Resource inputFeed;

  @Bean(name="insertIntoDbFromCsvJob")
  public Job insertIntoDbFromCsvJob(Step step1, Step step2) {

    var name = "Persons Import Job";
    var builder = new JobBuilder(name, jobRepository);
    return builder.start(step1).build();
  }

  @Bean
  public Step step1(ItemReader<Person> reader,
                    ItemWriter<Person> writer,
                    ItemProcessor<Person, Person> processor,
                    PlatformTransactionManager txManager) {

    var name = "INSERT CSV RECORDS To DB Step";
    var builder = new StepBuilder(name, jobRepository);
    return builder
        .reader(reader)
        .writer(writer)
        .build();
  }

  @Bean
  public FlatFileItemReader<Person> reader(
      LineMapper<Person> lineMapper) {
    var itemReader = new FlatFileItemReader<Person>();
    itemReader.setLineMapper(lineMapper);
    itemReader.setResource(inputFeed);
    return itemReader;
  }

  @Bean
  public DefaultLineMapper<Person> lineMapper(LineTokenizer tokenizer,
                                              FieldSetMapper<Person> fieldSetMapper) {
    var lineMapper = new DefaultLineMapper<Person>();
    lineMapper.setLineTokenizer(tokenizer);
    lineMapper.setFieldSetMapper(fieldSetMapper);
    return lineMapper;
  }

  @Bean
  public BeanWrapperFieldSetMapper<Person> fieldSetMapper() {
    var fieldSetMapper = new BeanWrapperFieldSetMapper<Person>();
    fieldSetMapper.setTargetType(Person.class);
    return fieldSetMapper;
  }

  @Bean
  public DelimitedLineTokenizer tokenizer() {
    var tokenizer = new DelimitedLineTokenizer();
    tokenizer.setDelimiter(",");
    tokenizer.setNames("firstName", "lastName", "age", "active");
    return tokenizer;
  }

  @Bean
  public JdbcBatchItemWriter<Person> writer(DataSource dataSource) {
    var provider = new BeanPropertyItemSqlParameterSourceProvider<Person>();
    var itemWriter = new JdbcBatchItemWriter<Person>();
    itemWriter.setDataSource(dataSource);
    itemWriter.setSql(INSERT_QUERY);
    itemWriter.setItemSqlParameterSourceProvider(provider);
    return itemWriter;
  }

}

If the above configuration seems like a lot then you can merge the DefaultLineMapper, DelimitedLineTokenizer and BeanWrapperFieldSetMapper in the FlatFileItemReader bean itself.

@Bean
public FlatFileItemReader<Person> reader() {

  FlatFileItemReader<Person> reader = new FlatFileItemReader<>();

  reader.setResource(inputFile);
  
  reader.setLineMapper(new DefaultLineMapper<Person>() {{
    setLineTokenizer(new DelimitedLineTokenizer() {{
      setNames("firstName", "lastName", "age", "active");
    }});
    setFieldSetMapper(new BeanWrapperFieldSetMapper<Person>() {{
      setTargetType(Person.class);
    }});
  }});

  return reader;
}

4. Demo

4.1. Maven

Make sure you have the following dependencies in the project:

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-quartz</artifactId>
</dependency>
<dependency>
  <groupId>com.h2database</groupId>
  <artifactId>h2</artifactId>
  <scope>runtime</scope>
</dependency>

4.2. Run the Application

Now run the application, and watch out for the console logs.

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.ApplicationContext;

@SpringBootApplication
public class BatchProcessingApplication implements CommandLineRunner {

  private final JobLauncher jobLauncher;
  private final ApplicationContext applicationContext;

  public BatchProcessingApplication(JobLauncher jobLauncher, ApplicationContext applicationContext) {
    this.jobLauncher = jobLauncher;
    this.applicationContext = applicationContext;
  }

  public static void main(String[] args) {
    SpringApplication.run(BatchProcessingApplication.class, args);
  }

  @Override
  public void run(String... args) throws Exception {

    Job job = (Job) applicationContext.getBean("insertIntoDbFromCsvJob");

    JobParameters jobParameters = new JobParametersBuilder()
        .addString("JobID", String.valueOf(System.currentTimeMillis()))
        .toJobParameters();

    var jobExecution = jobLauncher.run(job, jobParameters);

    var batchStatus = jobExecution.getStatus();
    while (batchStatus.isRunning()) {
      System.out.println("Still running...");
      Thread.sleep(5000L);
    }
  }
}

The program output:

2023-11-29T14:32:54.612+05:30  INFO 24044 --- [main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [SimpleJob: [name=Persons Import Job]] launched with the following parameters: [{'JobID':'{value=1701248574579, type=class java.lang.String, identifying=true}'}]
2023-11-29T14:32:54.631+05:30  INFO 24044 --- [main] o.s.batch.core.job.SimpleStepHandler     : Executing step: [INSERT CSV RECORDS To DB Step]
2023-11-29T14:32:54.647+05:30  INFO 24044 --- [main] c.h.d.b.j.c.l.PersonItemReadListener     : Reading a new Person Record
2023-11-29T14:32:54.662+05:30  INFO 24044 --- [main] c.h.d.b.j.c.l.PersonItemReadListener     : New Person record read : Person(firstName=Lokesh, lastName=Gupta, age=41, active=true)
2023-11-29T14:32:54.664+05:30  INFO 24044 --- [main] c.h.d.b.j.c.l.PersonItemReadListener     : Reading a new Person Record
2023-11-29T14:32:54.665+05:30  INFO 24044 --- [main] c.h.d.b.j.c.l.PersonItemReadListener     : New Person record read : Person(firstName=Brian, lastName=Schultz, age=42, active=false)
2023-11-29T14:32:54.665+05:30  INFO 24044 --- [main] c.h.d.b.j.c.l.PersonItemReadListener     : Reading a new Person Record
2023-11-29T14:32:54.665+05:30  INFO 24044 --- [main] c.h.d.b.j.c.l.PersonItemReadListener     : New Person record read : Person(firstName=John, lastName=Cena, age=43, active=true)
2023-11-29T14:32:54.666+05:30  INFO 24044 --- [main] c.h.d.b.j.c.l.PersonItemReadListener     : Reading a new Person Record
2023-11-29T14:32:54.666+05:30  INFO 24044 --- [main] c.h.d.b.j.c.l.PersonItemReadListener     : New Person record read : Person(firstName=Albert, lastName=Pinto, age=44, active=false)
2023-11-29T14:32:54.666+05:30  INFO 24044 --- [main] c.h.d.b.j.c.l.PersonItemReadListener     : Reading a new Person Record
2023-11-29T14:32:54.676+05:30  INFO 24044 --- [main] o.s.batch.core.step.AbstractStep         : Step: [INSERT CSV RECORDS To DB Step] executed in 44ms
2023-11-29T14:32:54.679+05:30  INFO 24044 --- [main] .j.c.l.JobCompletionNotificationListener : JOB FINISHED !!

Drop me your questions in the comments section.

Happy Learning !!

Source Code on Github

Doug

March 7, 2024 at 4:16 am

FlatFileItemReader throws this exception >>>
org.springframework.batch.item.file.NonTransientFlatFileException: Unable to read from resource:
- Lokesh Gupta
  
  March 7, 2024 at 1:38 pm
  
  Check that the file exists and is accessible by the application. Also check file structure (incorrect delimiters, missing columns). Consider enabling the debug logs for more detailed information.
Sreedhar

February 14, 2023 at 5:06 am

Can you please write test case for this example using Mockito. Thanks in advance.
sanket

October 20, 2022 at 12:31 am

What happen if the CSV file is very large
- Lokesh Gupta
  
  October 25, 2022 at 12:24 pm
  
  Consider using OpenCSV for parsing.
David Fanjkutic

May 20, 2020 at 3:46 am

this does not work for CSV records that are multi-line
Amith

April 6, 2020 at 1:20 pm

How can the above code be enhanced to validate the inputs. Lets assume firstname and lastname can be of size max 5 length. How can the code be modified to include the constraints along with exception handling
Sweety

January 2, 2020 at 6:14 pm

Hi I am new to Java can you please tell me what chnages required for Postgres DB
chaimae

July 12, 2019 at 11:07 pm

when im trying to run the project i got this error :
PLICATION FAILED TO START
***************************

Description:

Field jobLauncher in com.example.demo.App required a bean of type ‘org.springframework.batch.core.launch.JobLauncher’ that could not be found.

The injection point has the following annotations:
– @org.springframework.beans.factory.annotation.Autowired(required=true)

Action:

Consider defining a bean of type ‘org.springframework.batch.core.launch.JobLauncher’ in your configuration.
- Dhruv
  
  July 23, 2019 at 4:50 pm
  
  Hi chaimae,
  You got resolved this issue? I am having same issue.
  
  Thanks
govind

May 21, 2019 at 12:28 pm

i am not able ad new files in the directory . reader doesn’t read the record from the directory which newly added .
Priyanka Sinha

October 30, 2018 at 6:20 pm

Hi,

I am working on spring application. I am reading csv or excel file with very large dataset. Please provide me some reference for this using spring batch and printing to console for Spring application, not SpringBoot app. Above given example is for SpringBoot application.

I have written BatchConfig and console writer classes. I am not able to test it as App class mentioned above is for springboot application, not for spring. Could you please help me in writing App class for spring application and testing it?

Thanks in advance.
Priyanka Sinha

October 30, 2018 at 6:15 pm

Please provide me some reference for reading csv or excel file using spring batch and printing to console for Spring application, not SpringBoot app. Above given example is for SpringBoot application.

Thanks in advance.