spring-projects / spring-batch Goto Github PK
View Code? Open in Web Editor NEWSpring Batch is a framework for writing batch applications using Java and Spring
Home Page: http://projects.spring.io/spring-batch/
License: Apache License 2.0
Spring Batch is a framework for writing batch applications using Java and Spring
Home Page: http://projects.spring.io/spring-batch/
License: Apache License 2.0
Lucas Ward opened BATCH-6 and commented
All schema definition files (DB2, Postgre, Derby) should be checked to ensure they all are accurate. There should also be only one version for all projects. (probably the container project)
Affects: 1.0-m1
Lucas Ward opened BATCH-32 and commented
The current implementation of statistics via JMX exposes a 'getStatistics()' method at the container level, which returns a properties object for all the steps running. The StepExecution object naturally contains all information about the current step running (start-time, end-time, commit count, luw count and module statistics) and would provide a natural mechanism. Another option is to allow the entire execution to be serializable to a String, which could be returned. This would allow the simple, jconsole to work, or allow more advanced admin consoles to be created.
Affects: 1.0-m2
Dave Syer opened BATCH-34 and commented
Scenario: Read 100 rows from the database and write to file, Output file can have at most 50 lines(rows). So does spring batch support creating 2 files each containing 50 lines (filename.txt.001 & filename.txt.002).
This is supported, however, there isn't any kind of 'max record count' that would signal to the FileOutputSource that a new file should be created. The outputsource would simply need to be closed and reopened. However, a wrapper could easily be written that would do this for you.
Edit/Delete Message
Affects: 1.0-m2
Attachments:
Dave Syer opened BATCH-45 and commented
Create a POJO adapter for batch Tasklet/RepeatCallback. Similar to MessageHandlerAdapter. The only tough question is what to do about the return value - the generated callback will have to return false (or something) at some point to say that the job is complete.
A neat extra feature would be to allow methods with signature Object handle(Object state), or a strongly typed version of the same, so that implementations can handle their own state (the return value will be passed in to the next call as input).
Affects: 1.0-m2
Attachments:
Issue Links:
Referenced from: commits 678ccc0
Lucas Ward opened BATCH-39 and commented
The feature list for 1.0 has always included bother driving query and cursor based sql input sources. However, the DrivingQuery solution that was created needs refactoring. Complexity in this input source is caused in many ways because the current solution tries to handle composite keys as well as single keys. A single key solution should be created, since it can be very simple, requiring only a driving query that returns one column and a details query with only one required parameter. (one question mark).
Affects: 1.0-m1
Dave Syer opened BATCH-14 and commented
Create efficient framework for launching / bootstrapping jobs in the common case of just launching a main method. This is the most common use case by far (e.g. kicked off by external scheduler like Autosys).
Start a named job for a given container
Organise and manage a large collection of jobs - make it easy to add or edit an existing job
Report or return error code at the end of a process (System.exit()). Error codes have to be easily customisable to fit business exceptions.
Issue Links:
Dave Syer opened BATCH-37 and commented
Refactor the StepExecutorFactory to make it look up a StepExecutor based only on the current Job/Step instead of the StepConfiguration. We shouldn't be putting completion policies and other low level concerns in the StepConfiguration, but clearly there might in special circumstances be a need for a different StepExecutor depending on the Job/Step. I propose a mapping style (StepExecutorFactory -> StepExecutorResolver) like a URL mapping in the web tier.
Affects: 1.0-m2
Dave Syer opened BATCH-21 and commented
OuputResource abstraction for file / stream output. Would remove some of the non-batch specific common code in the open()/close() methods of the OutputSources. E.g.
if (file.exists()) {
if (os.shouldDeleteIfExists) {
file.delete();
}
else {
throw new IOException("Resource already exists: " + resource);
}
}
file.createNewFile();
There is no wrapper for an output stream in Spring Core, but maybe it would port over if we get it working.
Affects: 1.0-m1
Issue Links:
Dave Syer opened BATCH-26 and commented
The read ahead limit in ResourceLineReader is worrying. There is no guarantee that a) it will be large enough to hold the whole transaction, b) if set very large won't cause performance issues with long transactions / large records.
You actually have to catch IOException on a reset() and look at the message to work out if the read ahead limit was breached. Yuck. It's sort of irrelevant anyway because even if we could determine that the read-ahead limit was breached, we would have no choice but to terminate the batch - effectively it is a failed rollback. The rollback may not have been caused by bad input data (generally that is not fatal for the transaction), but that doesn't rule out a deterministic problem that causes the same failure to happen on the next restart, ad infinitum.
Affects: 1.0-m1
Issue Links:
Dave Syer opened BATCH-43 and commented
Calls to JobRepository come in pairs - could simplify interface? E.g update(Job) is always linked with saveOrUpdate(JobExecution). The same is true for the Step* equivalent. The same is not quite true the other way round (saveOrUpdate does not always go with update), but that doesn't matter much, and it might be just as well to update the Job at the same time, even if it hasn't necessarily changed.
The new interface could use *ExecutionContext, or just collapse the two methods into one. The advantage of using the *ExecutionContext is that Dao implementations could use the rest of the context in special ways (e.g. inspect context to see if there are any errors to report / save / send). Need to be careful of cycles.
Affects: 1.0-m2
Robert Kasanicky opened BATCH-28 and commented
Jobs crash when they are configured to store restart data. I have tried this in the samples module, by editing the simple-container-definition.xml (recoveryPolicy -> storeRestartData -> true). The result was a NullPointerException (restartData.getProperties() returns null).
I believe the root of the problem lies in the SimpleStepExecutor#process(...) method:
if (shouldPersistRestartData) {
restoreFromRestartData(module, step.getRestartData());
}
It does not make sense to me that restoreFromRestartData(...) is called based on whether restartData should be saved. I would expect a check whether we are doing restart and whether there is some restartData stored.
Affects: 1.0-m2
Dave Syer opened BATCH-46 and commented
RepeatContextAware as lifecycle option for step scoped beans. The RepeatContext could be injected by the StepScope at the time the object is created.
Affects: 1.0-m2
Issue Links:
Lucas Ward opened BATCH-2 and commented
There are currnetly interfaces for both ItemProvider/Processor and DataProvider/Processor. I understand some of the reason for keeping ItemProvider is due to the additional methods, getKey and Recover, which are useful in skipping. However, having these two sets of interfaces is confusing and needs to be resolved before releasing m2. Perhaps recover and getKey can be moved to another interface and 'mixed-in' to the AbstractProvider/Processor?
Affects: 1.0-m1
Lucas Ward opened BATCH-9 and commented
A simple jmx implementation of management and monitoring needs to be implemented for Milestone 2. At a minimum, being able to stop a job at the next commit interval and shut down gracefull should be implmented.
Affects: 1.0-m1
Lucas Ward opened BATCH-3 and commented
The appropriate licensing/copyright headers need to be added before releasing M2.
Affects: 1.0-m1
Dave Syer opened BATCH-50 and commented
Allow fixed-length file columns with column ranges, e.g. 1-8,11-15 (instead of 1,9,11).
Affects: 1.0-m2
Attachments:
Douglas C. Kaminsky opened BATCH-41 and commented
Duplicated class: TransactionValidException
org.springframework.batch.io.exception
org.springframework.batch.container.common.exception
No further details from BATCH-41
Dave Syer opened BATCH-47 and commented
Allow RepeatInterceptor to change the return value from RepeatCallback - then custom interceptor could be used to set an exit code.
Affects: 1.0-m2
Issue Links:
Lucas Ward opened BATCH-1 and commented
There are two versions of BatchStatus in the following packages:
For ease in understanding, one of them should be renamed.
Affects: 1.0-m1
Dave Syer opened BATCH-19 and commented
Use in-memory database instead of default derby for better startup/shutdown performance. Especially shutdown takes a very long time because of derby internals (it seems), and it makes it very difficult to run tests quickly.
There is a derby in-memory driver, but it isn't part of the released product yet. Or we could use HSQLDB (but that is weak on locking and savepoints, so not good for some integration tests).
Affects: 1.0-m1
Lucas Ward opened BATCH-11 and commented
BeanWrapperFieldSetMapper doesn't support mapping to collections. A test case has been added to it's unit test showing usage, but is commented out. However, it's a fairly minor issue, and I'm only aware of one client that needs the functionality. It should probably not be included in Milestone 2.
No further details from BATCH-11
Lucas Ward opened BATCH-8 and commented
When outputing to a file there are a few scenarios that need to be thought through regarding the creation of updating of a file. For example, if a job runs and outputs 10 lines and then fails, the next run should start at the end of the created file. However, if it's a new instance of the same job (i.e. not restartable) then the old version should be deleted and another created. One option is to forgo even attempting to make this calculation and requiring manual intervention in this scenario. Another would be appending a random number to the file names of non restartable jobs. The final could be a smart ResourceLocator that knows about restart and makes the determination based on configuration.
Affects: 1.0-m1
Lucas Ward opened BATCH-5 and commented
Currently, the container project is named 'container scratch' in subversion, this should be changed to 'container'.
Affects: 1.0-m1
Lucas Ward opened BATCH-7 and commented
Currently, all input templates (simpleFileInput is an exception) store their state in a threadlocal. However, this impacts readability and is overkill for most cases. It may be a better approach to either remove the thread local with the expectation that either the input or the entire module itself will be wrapped in a threadlocal scope(since many cases may result in a non-thread safe module being created and then needing partitioning for performance reasons) Or the InputState could be created as a dependency to the inputs (something like SqlInputState) which would default to a simple, non thread safe implementation, but could be replaced with an implementation that uses either a wrapper or composite pattern to wrap the state in a thread local.
Affects: 1.0-m1
Issue Links:
Dave Syer opened BATCH-35 and commented
Create HBM mappings for version and handle with non-null value in SqlDao. For auditability it makes sense for the Job/Step entities, and also for the others - if HibernateDao is used it will be automatic. The Sql*Dao just needs to make sure that the version is not null - otherwise if Hibernate is ever used it will have problems.
Affects: 1.0-m2
Dave Syer opened BATCH-22 and commented
Find alternative to ThreadLocal for RepeatSynchronizationManager - not allowed in WebSphere. Also essential for scope="batch" as required by fully-fledged container to allow state management of input/output sources in a potentially multi-threaded environment.
Affects: 1.0-m1
Lucas Ward opened BATCH-38 and commented
Currently, the only strategy a developer has for controlling what happens in a rollback or skip scenario is by modifying an exception handler. However, this makes the developer think outside the 'configuration domain' and think about things in terms of the 'batch execution domain' i.e. the container or infrastructure running the job. Instead, there should be integers in the StepConfiguration to control failures (i.e. when a transaction fails) and skips.
Affects: 1.0-m1
Lucas Ward opened BATCH-10 and commented
The ResourceLineReader class throws an exception from it's close() method. Because StepLifecycle classes call module.close() (which in turn will likely end up calling close all the way down to the input template and it's reader) from a finally() block, this could cause serious issues.
Affects: 1.0-m1
Lucas Ward opened BATCH-4 and commented
The packaging structure of the infrastucture and especially the container project needs to be reviewed before release. A few preliminary reviews have brought up questions such as "are all the members of the common package truly common to all containers?" along with some questions about whether a couple of the packages could be combined. For example, should job and step configuration be in the domain package? Regardless, in order to ensure we have minimal impacts of any changes after releasing the milestone, there should be a deep review session.
Affects: 1.0-m1
Dave Syer opened BATCH-29 and commented
All domain entities and scoping abstractions (RepeatContext in particular) need to be Serializable. Things work now in a simple execution environment, but in a cluster or distributed environment we will quickly be able to identify the things that need to be Serializable. Also some JMX operations might work better if the return types were Serializable.
Affects: 1.0-m2
Issue Links:
Lucas Ward opened BATCH-16 and commented
Both the Xml and FlatFile output templates deal with creating a file, and in certain cases dealing with what to do with a file that already exists. Since both do the same thing, it seems logical that they could both extend an abstract class that deals with the files in a uniform way.
Affects: 1.0-m1
Issue Links:
Referenced from: commits b89bbba
Dave Syer opened BATCH-24 and commented
Provide exit code for job executed as a main method. Some strategy to allow operators / configuration to override and add error codes, e.g. mapped to exception types.
Affects: 1.0-m1
Attachments:
Issue Links:
BATCH-45 Create a POJO adapter for batch Tasklet
("is depended on by")
BATCH-40 An application developer must be able to control the 'exit codes' returned by the container
BATCH-14 Create efficient framework for launching / bootstrapping jobs in the common case of just launching a main method.
1 votes, 2 watchers
Dave Syer opened BATCH-36 and commented
Create HBM mappings for version and handle with non-null value in SqlDao. For auditability it makes sense for the Job/Step entities, and also for the others - if HibernateDao is used it will be automatic. The Sql*Dao just needs to make sure that the version is not null - otherwise if Hibernate is ever used it will have problems.
Affects: 1.0-m2
Dave Syer opened BATCH-48 and commented
Add terminateOnly flag to RepeatContext. When a job is stopped it is different from a business operation voting to mark it as complete - the terminate flag probably should also set the complete flag, but clients (StepExecutor, JobExecutor) will do different things with the result. Also implies a new value in the BatchStatus enum (INTERRUPTED as distinct from STOPPED, FAILED).
Affects: 1.0-m2
Issue Links:
Lucas Ward opened BATCH-12 and commented
Currently, the SqlSelectStatement interface extends cloneable. This is bad practice. It would be preferable for implementations to have a copy constructor.
No further details from BATCH-12
Lucas Ward opened BATCH-30 and commented
The JMS integration tests in the integration project periodically 'hang' when run from maven. Killing the process and rerunning the tests will usually clear up the error, but it happens often enough to warrant being looked into. A trace is included below:
Running org.springframework.retry.jms.SynchronousTests.....
2007-07-03 13:10:58,781 INFO [org.springframework.retry.jms.SynchronousTests] - <Loadin
g context for locations: /org/springframework/batch/jms/jms-context.xml>
2007-07-03 13:10:58,890 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionRe
ader] - <Loading XML bean definitions from class path resource [org/springframework/bat
ch/jms/jms-context.xml]>
2007-07-03 13:10:59,218 INFO [org.springframework.context.support.GenericApplicationCon
text] - <Refreshing org.springframework.context.support.GenericApplicationContext@601bb
1: display name [org.springframework.context.support.GenericApplicationContext@601bb1];
startup date [Tue Jul 03 13:10:59 CDT 2007]; root of context hierarchy>
2007-07-03 13:10:59,218 INFO [org.springframework.context.support.GenericApplicationCon
text] - <Bean factory for application context [org.springframework.context.support.Gene
ricApplicationContext@601bb1]: org.springframework.beans.factory.support.DefaultListabl
eBeanFactory@1479feb>
No further details from BATCH-30
Lucas Ward opened BATCH-40 and commented
Currently, it is easy to return 'error codes' by mapping a returned exception back to an error code. However, there needs to be a way to support returning different exit codes even if the job finished successfully. For example, a developer has a business reason to return a different exit code, so that the scheduler would make a different branch.
Affects: 1.0-m1
Issue Links:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.