Comments (3)
Lucas Ward commented
After looking at the issue, it's really only necessary for the developer to specify how many skips they want to allow, since that's all they really care about. Rollbackcount was left in the StepExecution because it is an extremely useful statistics for operations teams. It also doesn't matter to the team what happened after the rollback (retry or recover), they just need to know how many times a particular job rolled-back.
from spring-batch.
Tommy C. Trang commented
In the batch summary statistics, we definitely need to know how many records skipped so we can investigate and handle those records after the original run. At C-IV, we used a preset max amount of allowable skipped records. If the maximum number of allowable skipped records is exceeded, the non-fatal exception is converted into a fatal exception to stop the run. This indicates that something is seriously wrong with the input. We haven't use a percentage of records for allowable skips because we do not know the total number of records that will be processed in the run at the beginning of the run. We didn't calculate the total number of records that will be processed because of the additional processing required. A few options to get the total records upfront are:
- Run the SQL twice, once for the sum and once for the result set to process
Problem: Driving query is run twice - Run the SQL once and make the result set scrollable. Go to the last record to get the record position for the total number of record.
Problem: Scrollable prepared statement / result set are more expensive. The architecture we used is cursor-driven and kept the driving query result set opened for the entire run. Using the scrollable option here loads the entire result set into memory and that can be problematic for memory utilization. - Run the SQL once and load the data into an ArrayList, then close the result set.
Problem: All data are in an ArrayList and used up lots of memory. The ArrayList is in scope for the entire run. - Don't use a cursor-driven approach. Run the driving query to retrieve only the record key and load the record key into an ArrayList, then close the cursor. Run the driving query to retrieve all the required data for a specific record when processing.
Problem: The driving query is more often than not the longest running SQL in the program. This approach runs the driving query logics once for each record and that can be a performance problem. It has been proven in my past projects that retrieving as much information as possible in a single SQL in the driving query are usually best for performance.
In conclusion, I think it is a nice-to-have to specify a percentage of records for skipped but don't think it is worth the effort or performance hit. I like the simple approach of just specifying a single integer value for allowable skipped before stopping the batch run due to large amount of skipped records. Handling what to do with the skipped records is a separate topic whether to automatically re-run or manually the next day. There should be small finite possibilities to re-process skipped records.
from spring-batch.
Lucas Ward commented
Skip Limit has been added to StepConfiguration. There probably needs to be some code in the DefaultStepExecutor to deal with this value, but that can be created as a new issue.
from spring-batch.
Related Issues (20)
- Improve recommendations for indexing metadata tables
- Incorrect deprecation in MongoPagingItemReader
- Schema Migration with Flyway HOT 2
- DELETE CASCADE on Foreign Keys HOT 6
- Spring Batch step write_count less than read_count and filter and skip counts are all zero HOT 1
- Incorrect Chunk property value in implementation of ItemWriter write method HOT 3
- 5.1.2 Backported issues HOT 1
- 5.0.6 Backported issues HOT 1
- remote partitioning doesn't work if you're using graalvm
- Access Job Description
- Kotlin data class support for `FlatFileItemReaderBuilder` HOT 4
- Deserialization of JobParameters throws exception
- Default value for ignoreWarnings in JdbcCursorItemReaderBuilder does not align with documentation HOT 1
- JdbcPagingItemReader - When using sortKeys with alias, I think it should paging by column name rather than alias in the select clause.
- MockStatic and MockConstruction not working in spring batch test cases HOT 1
- ListItemWriter<T>::getWrittenItems has bad wildcard return type HOT 1
- Data class support in JdbcCursorItemReaderBuilder & JdbcPagingItemReaderBuilder HOT 2
- Discrepancy in Thread Safety Documentation for AbstractPagingItemReader Subclasses
- Improve Error Messages in JobParametersBuilder methods
- The table field type in the SQL Server database is ntext and synchronized to the Oracle database, resulting in invalid column types in the corresponding NCLOB
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spring-batch.