This issue has been moved from the zendframework
repository as part of the bug migration program as outlined here - http://framework.zend.com/blog/2016-04-11-issue-closures.html
Original Issue: https://api.github.com/repos/zendframework/zendframework/issues/4772
User: @bakura10
Created On: 2013-07-02T12:52:46Z
Updated At: 2014-09-28T00:34:50Z
Body
Hi everyone,
This is a branch I'm working on on my free time, to refactor the input filter component for ZF 3. I mostly push this for people who want to see the code and help.
It is not yet usable because it relies on some refactoring I'm doing in both Validator and Filter components. It completely embraces the service manager, so now we should not have the VERY annoying problem of adding custom validators/filters, and not being able to use them using the short name because it uses plugin managers that were not fetched from SM.
Under the hood, the component is completely rewritten and now uses SPL RecursiveFilter. From my first experiments, this allow an around x5-8 performance improvements and a memory consumption divided by 13 for complex input filters (magic happen here: https://github.com/bakura10/zf2/blob/943abe025ed578c4068cf19df079b34726d449a3/library/Zend/InputFilter/BaseInputFilter.php#L274-L300)
What I've seen is that the input filter component got more and more bloated, and it is now slow as hell. Checking the isValid method as it is now makes it quite clear where the problem is.
Therefore, for ZF3, I suggest that the base input filter to stay as simple as possible, and people wanting to have custom behaviors to extend the class and override the isValid for their use-cases (as a more general rule, I'd like this for most components too).
To help this, it also uses internally SPL RecursiveFilter. Basically, it allows to have very powerful validation group rules. For instance, this branch features a ValidationGroupRegex filter, and usage would be as follow:
$inputFilter = new InputFilter();
// … populate the input filter with lot of fields
// You can use the as-usual validation group:
$inputFilter->setValidationGroup(array('username', 'email'));
// But also some more complex rules. Here it only validate fields that contain 'registration-'
// in their names
$validationGroupRegex = new ValidationGroupRegexFilter($inputFilter, '/registration-/i')
$inputFilter->setValidationGroupFilter($validationGroupRegex);
Those filters are really easy to write. They must extend AbstractValidationGroupFilter and implements the "accept" method (like this: https://github.com/bakura10/zf2/blob/943abe025ed578c4068cf19df079b34726d449a3/library/Zend/InputFilter/Filter/ValidationGroupRegexFilter.php)
What do you think ? :)
NOTE : not all features have been back ported yet.
Comment
User: @desem
Created On: 2013-07-02T13:33:40Z
Updated At: 2013-07-02T13:33:40Z
Body
And I like it :)
http://www.ivangospodinow.com/simple-form-validator-for-zend-framework-2-forms
Comment
User: @bakura10
Created On: 2013-07-02T13:38:22Z
Updated At: 2013-07-02T13:38:22Z
Body
Thanks :). I change a little bit the logic so now any recursive iterator filter can be specific, so you can use the RecursiveCallbackFilterIterator:
$filter = new RecursiveCallbackFilterIterator($inputFilter, function($value, $key)) {
if ($key === 'toto') {
return false;
}
return true;
}
This will only validate fields except the one different than "toto".
Pretty powerful :D.
Comment
User: @desem
Created On: 2013-07-02T13:56:33Z
Updated At: 2013-07-02T13:56:33Z
Body
It's interesting. But I would start with a common forum for zend :)
Comment
User: @texdc
Created On: 2013-07-04T19:13:03Z
Updated At: 2013-07-04T19:13:03Z
Body
Please see my notes.
Comment
User: @bakura10
Created On: 2013-07-04T19:23:45Z
Updated At: 2013-07-04T19:23:45Z
Body
I answered :).
Comment
User: @bakura10
Created On: 2013-08-09T09:51:01Z
Updated At: 2013-08-09T09:51:01Z
Body
I tried to add more exciting stuff to this refactor. Now you can add a callback recursive filter to act as a validation group:
$inputFilter = new InputFilter();
$recursiveFilter = new ValidationGroupCallbackFilter($inputFilter, function($value, $key, $iterator) {
// This get called for each element, so you can reject easily
if ($key === 'lol') {
return false;
}
return true;
});
$inputFilter->isValid();
Comment
User: @bakura10
Created On: 2013-08-09T10:17:43Z
Updated At: 2013-08-09T10:17:43Z
Body
Based on ocramius feedback, I've renamed the InputFilter class to InputCollection, which may be clearer for people. What do you think?
The only problem is that people may be confused between the InputCollection and CollectionInputCollection (previously CollectionInputFilter). I'll try to have a better idea for this.
Comment
User: @texdc
Created On: 2013-08-09T15:06:32Z
Updated At: 2013-08-09T15:06:32Z
Body
I like the 'InputCollection' name! I've just started to use 'CollectionInputFilter', so I understand the issue. Why not just 'CollectionInput'?
On Aug 9, 2013, at 5:17 AM, Michaël Gallego [email protected] wrote:
Based on ocramius feedback, I've renamed the InputFilter class to InputCollection, which may be clearer for people. What do you think?
The only problem is that people may be confused between the InputCollection and CollectionInputCollection (previously CollectionInputFilter). I'll try to have a better idea for this.
—
Reply to this email directly or view it on GitHub.
Comment
User: @bakura10
Created On: 2013-08-10T08:21:41Z
Updated At: 2013-08-10T08:21:41Z
Body
@texdc , I'm not even sure the CollectionInputFilter is really needed. After all, the collection input filter does nothing more than applying the same input filter to a collection. Can't we simply run the same input filter against different set of data ? (if I managed to make the input filter stateless it's going to be even more true… but turning it stateless is really hard…)
Comment
User: @bakura10
Created On: 2013-08-10T08:39:19Z
Updated At: 2013-08-10T08:42:23Z
Body
By the way, here are some performance figures. I simply created a single input filter. Please note that my current implementation does a little more than the current one as each element are created using a factory (both input and input collection), so that now the global validator and filter plugin managers are always correctly injected (this was always hard in current implementation). Please note also that a major time during construction is instead spent on the service locator, therefore I've decided to create the element using new instead of using the plugin manager (after doing this test I realized how service manager is slow).
In order not to make the test wrong, no validator nor filters are attached to elements. This test just take into account the traversal time.
With 50 elements
ZF2 :
Input filter construction time: 0.0015020370483398 msec
Traversal time: 0.0062439441680908 msec
Memory consumption: 3 Mb
ZF3 :
Input filter construction time: 0.00066399574279785 msec
Traversal time: 0.0010840892791748 msec
Memory consumption: 2,75 Mb
With 500 elements
ZF2 :
Input filter construction time: 0.010568141937256 msec
Traversal time: 0.058469772338867 msec
Memory consumption: 6 Mb
ZF3 :
Input filter construction time: 0.0047299861907959 msec
Traversal time: 0.0099430084228516 msec
Memory consumption: 3,75 Mb
With 5000 elements
ZF2 :
Input filter construction time: 0.1059877872467 msec
Traversal time: 0.60562801361084 msec
Memory consumption: 37 Mb
ZF3 :
Input filter construction time: 0.048290014266968 msec
Traversal time: 0.10112690925598 msec
Memory consumption: 11,75 Mb
We can expect the difference to be even more when using nested input filters (which is quite common), but already, the implementation that uses RecursiveIteratorIterator, once most important features were back ported, traversal is around 6 times faster and construction 2 times faster, and consumes a lot less memory with a lot of elements.
Furthermore, all the validation group features, even with complex ones like Callback, come nearly for free with RecursiveIteratorIterator.
Comment
User: @bakura10
Created On: 2013-09-02T15:54:03Z
Updated At: 2013-09-02T15:54:03Z
Body
Hi everyone,
I've taken into account a lot of feedbacks and I've completely rewritten some parts of it.
The big news is that now input filter component is completely stateless. Both input and input collection. Because of this, I had to abandon the RecursiveIteratorIterator implementation that was not flexible enough. Now it uses IteratorIterator instead. As a consequence, perf improvements are a bit less drastic (especially in construction time).
How to use it
Now, input filter component is stateless. This means that there is no longer "isValid" method. Instead, you are expected to use the "validate" method:
// Note: the Factory dependency will be hanlded by a ZF factory
$inputCollection = new InputCollection(new Factory(new InputFilterManager()));
$inputCollection->add(array(
'name' => 'foo'
));
$data = array('foo' => 'myValue');
$validationResult = $inputCollection->validate($data, $optionalContext);
if ($validationResult->isValid()) {
$values = $validationResult->getValues();
// or $rawValues = $validationResult->getRawValues()
} else {
$error = $validationResult->getErrorMessages();
}
Because validation result is completely decoupled, it can now be serialized or converted to JSON:
json_encode($validationResult); // Output error messages in JSON
Validation result
One of the main goal of this refactor was, as I said it, to simplify it and remove a lot of crap because of everyone adding its edge cases.
Instead, I've written several methods that make it easy to extend default behaviour:
- People can extend InputCollection and override the "buildValidationResult", if they need to construct a custom validation result that also contains unknown inputs, for instance.
- You can also extend "buildErrorMessages", which is called for each input collection.
Also, the validation group feature got more powerful thanks to SPL filters. Iteration is not made using a simple foreach, but using an IteratorIterator:
$iterator = $this->getValidationGroupFilter();
$iteratorIterator = new IteratorIterator($iterator);
$errorMessages = array();
/** @var InputInterface|InputCollectionInterface $inputOrInputCollection */
foreach ($iteratorIterator as $inputOrInputCollection) {
SPL filter iterators are automatically executed for us and allow to accept or refuse a specific key.
By default, ZF attaches a "Zend\InputFilter\ValidationGroup\NoOpFilterIterator" filter that accepts all fields (it is the same as setting an empty array as validation group). But you can also attach more complex filter. For instance using an array:
$arrayFilter = new ArrayFilterIterator($inputCollection, array('foo'));
$inputCollection->setValidationGroupFilter($arrayFilter);
Now, only the "foo" input of this input collection will be validated.
It comes with other filter:
$callableFilter = new CallbackFilterIterator($inputCollection, function($key, $value) {});
$regexFilter = new RegexFilterIterator($inputCollection, '/user/');
This should open the way to very fine filtering, as an input collection can contain another input collection that contains its own validation group filter. In the future, I'll write the code to be able to specifcy an input filter through config:
$inputCollection->add(
'type' => 'Zend\InputFilter\InputCollection',
'name' => 'top',
'children' => array(
array('name' => 'bar')
),
'validation_group_filter' => array(
'type' => 'Zend\InputFilter\ValidationGroup\RegexFilterIterator',
'options' => array(
'regex' => '/blabla/'
)
)
));
Performance
I've made some performance test, once again without adding any validators/filters so that the test focus only on input filter:
ZF2:
5 elements,
3 input collections with 5 elements inside
Construction time : 0.00055599212646484 secondes
Traversal time : 0.0064549446105957 secondes
Memory: 3 Mb
50 elements,
10 input collections with 50 elements inside
Construction time : 0.0096118450164795 secondes
Traversal time : 0.11807513237 secondes
Memory: 6,5 Mb
100 elements,
25 input collections with 100 elements inside
Construction time : 0.033029079437256 secondes
Traversal time : 0.51178097724915 secondes
Memory: 19,75 Mb
ZF3
5 elements,
3 input collections with 5 elements inside
Construction time : 0.00034809112548828 secondes
Traversal time : 0.0011918544769287 secondes
Memory: 2,5 Mb
50 elements,
10 input collections with 50 elements inside
Construction time : 0.0085010528564453 secondes
Traversal time : 0.01695990562439 secondes
Memory: 2,75 Mb
100 elements,
25 input collections with 100 elements inside
Construction time : 0.031938076019287 secondes
Traversal time : 0.060458183288574 secondes
Memory: 4,25 Mb
Comment
User: @Ocramius
Created On: 2013-09-02T16:17:17Z
Updated At: 2013-09-02T16:17:17Z
Body
As discussed with @bakura10 on IRC, we should have the same interface for Input
and InputFilter
. An InputFilter
is just a normal Input
with an array
or Traversable
value. Otherwise, I see no difference with the Input
API.
Any pluralization of method names in the API shall be removed.
Comment
User: @bakura10
Created On: 2013-09-02T16:22:36Z
Updated At: 2013-09-02T16:24:21Z
Body
Done.
So now InputCollection extends Input, and have the same "validate" method. I didn't change it yet but Input should return a ValidationResult too.
The question now is that InputFilter will receive an array of ValidationResult, so we must find an efficient way to inject the result into the main ValidationResult.
As @Ocramius suggested, I've alos removed the fallback value capability. In fact this is a Filter job. So maybe we should create a FallbackValueFilter, but this sohuld not be handled in Input Filter component.
Comment
User: @bakura10
Created On: 2013-09-02T17:47:51Z
Updated At: 2013-09-02T17:48:47Z
Body
I have integrated several new things:
- Input collection can now have their own validators and filters. When an input collection is validated, it FIRST execute its own validators, and then execute the validators of all its children. On the other hand, when data is filtered, all children are filtered, THEN the filters attached to the collection is executed. This allow to have, for instance, a Date input collection with three inputs, and validate or filter based on this context. This should open some nice simplifications to the form component, especially with the Collection element.
- The validation process is a bit complicatd, I'm not sure about hte best way to handle it.
Currently, whenever an input collection is validated, a new ValidationResult is built. If the input collection contains no errors, then the values are filtered. This means that in the following hiearrchy:
Input filter 1
input 1
input 2
input filter 2
input 3
input 4
if input 3, 4 and input filter 2 are valid, then the values are filtered EVEN THOUGH input 1 failed. The filter step is "on an input collection" basis.
This may not be the most efficient way because we need to do the recursion twice... If you have any better idea, please shout!
I also need people to evaluate this: https://github.com/bakura10/zf2/blob/d0cfc2e6643732b1e5e020f6154a6617e7e4d20d/library/Zend/InputFilter/InputCollection.php#L157
Comment
User: @bakura10
Created On: 2013-09-02T19:14:05Z
Updated At: 2013-09-02T19:14:05Z
Body
I've been using IteratorAggregate, which is around 30% faster than implement IteratorIterator. We are now even faster than when using RecursiveIteratorIterator :D. Some funny benchmark with HEAVY input filters (450 inputs, and 50 sub-input collections with 450 elements inside):
ZF2:
Construction time : 0.47534704208374 secondes
Traversal time : 7.0274820327759 secondes
Memory consumption: 154 Mb
ZF3:
Construction time : 0.48701810836792 secondes
Traversal time : 0.83196616172791 secondes
Memory consumption: 18 Mb
Comment
User: @Netiul
Created On: 2013-09-03T06:58:37Z
Updated At: 2013-09-03T06:58:37Z
Body
Impressive, I must say. Keep up the great work!
Comment
User: @bakura10
Created On: 2013-09-03T09:58:39Z
Updated At: 2013-09-03T09:58:39Z
Body
Thanks @Netiul .
I've just noticed something strange, I'd like your feedback. Since this time, I've always thought that the input filters does : validation using the raw data, then filtering. However, it seems that I was wrong: https://github.com/zendframework/zf2/blob/master/library/Zend/InputFilter/Input.php#L309
The filtering is done prior to the validation. Is this something expected? Is this normal behaviour and what we want? I'm always lost about this.
Comment
User: @bakura10
Created On: 2013-09-03T12:26:28Z
Updated At: 2013-09-03T12:26:28Z
Body
Another massive performance improvement for today (around 20%), because the whole input collection is now traversed only once instead of twice.
This is possible thanks to the previous remark about filtering data, some talk with @Ocramius, and the new stateless that allows interesting optimizations.
First, the ValidationResult has been renamed to InputFilterResult. This is because I'll try to make Validator component completely stateless too. It will return ValidationResult, so it was important to make the two different by the naming.
About the performance improvement it comes from the reason of my previous comment:
I realized that current implementation of Input, the "isValid" method filter data before validating it. In fact, it makes sense (if you have an input with a "Alnum" validator and a trimming Filter, you want " abc " to be valid). The thing is that in ZF2, values were filtered twice: once when the "isValid" method of the input filter is called, and another time you do "getValues". This is useless work.
Now, when the "validate" method for an input is called, as the value is filtered anyway, it's stored directly in the ValidationResult, and aggregated by the ValidationResult of the Input Collection. So calling "getData" from the input collection does not incur another recursion.
The second thing is that as I said before, InputCollection now can have their validators and filters (which shuld remove the need of the previous CollectionInputFilter class). I'm not sure about the workflow:
Let's assume the following input collection, called "coll", which contains two inputs "a", "b".
When calling validate, the following happens:
- The InputCollection validators (if any) are run. The array that the validator will receive will by ["a" => ..., "b" => ...].
1.1 If invalid, register error messages.
1.2 If input collection is configured to break on failure, we immediately return an input filter result.
- Otherwise, each inputs (a and b) are filtered and validated.
- If everything is correct, the input filter result is built, and the input collection filters are run for the value (in this case, the filtered values of a and b).
Does the order make sense to everyone?
Another question I have is: if an input has failed (not valid), should we insert the filtered value into the input filter result ?
Remark: because InputFilter is a stateless component now, I think we can safely switch the plugin manager to share instance by default. This should be a pretty memory consumption improvment for people using a lot of input filters.
Comment
User: @dphn
Created On: 2013-09-03T12:35:53Z
Updated At: 2013-09-03T12:35:53Z
Body
Tell me, please, why do not you rename InputFilter to InputSpecification? After all, it contains both filters and validators?
Comment
User: @bakura10
Created On: 2013-09-03T12:37:35Z
Updated At: 2013-09-03T12:37:35Z
Body
??? You're mixing everything again... InputSpecifciation does not exist. No classes have been renamed to InputSpecification.
InputFilter has been however renamed to InputCollection, because it represents more what it is (a collection of inputs).
Comment
User: @dphn
Created On: 2013-09-03T12:38:54Z
Updated At: 2013-09-03T12:38:54Z
Body
Thank you, I get it. So really better.
Comment
User: @weierophinney
Created On: 2013-09-03T16:10:05Z
Updated At: 2013-09-03T16:10:05Z
Body
The filtering is done prior to the validation. Is this something expected? Is this normal behaviour and what we want? I'm always lost about this.
@bakura10 :
Yes, this is normal behavior, and has been since ZF1. The idea is that you do normalization tasks, and then validate the normalized data (this is particularly useful for things like phone numbers and CC numbers, to allow stripping non-digit input such as spaces, dots, parens, etc). I've seen some requests in the past for a post-filter process as well, but the idea has never gained traction.
Comment
User: @bakura10
Created On: 2013-09-03T16:11:22Z
Updated At: 2013-09-03T16:11:22Z
Body
Ok thanks to confirm this then :).
Comment
User: @dphn
Created On: 2013-09-05T16:25:15Z
Updated At: 2013-09-05T16:25:15Z
Body
I wonder, in the third version will remain a mass func_get_arg () in the constructor of each validator?
Comment
User: @Ocramius
Created On: 2013-09-05T16:35:37Z
Updated At: 2013-09-05T16:35:37Z
Body
@dphn not sure if that's relevant to this PR - the main focus here is moving to a stateless interface first. Otherwise we'd need an Options
object for every validator, which is probably a further step for a different PR
Comment
User: @dphn
Created On: 2013-09-05T16:42:16Z
Updated At: 2013-09-05T16:42:16Z
Body
I'm sorry, I just took the most active channel of communication :) Not anymore.
Comment
User: @dphn
Created On: 2013-09-05T22:00:49Z
Updated At: 2013-09-05T22:00:49Z
Body
The filtering is done prior to the validation. Is this something expected? Is this normal behaviour and what we want? I'm always lost about this.
It would be nice if it were true. I'm trying to add a filter in the multiupload form, but it is always executed after validators.
Comment
User: @bakura10
Created On: 2013-09-05T22:02:35Z
Updated At: 2013-09-05T22:02:35Z
Body
Yes, this is expected for file upload though and won't change. For file it makes sense to first validate and then filter (which include things like file upload).
Comment
User: @dphn
Created On: 2013-09-05T22:11:39Z
Updated At: 2013-09-05T22:11:39Z
Body
I think that uploading files is not a task to be performed by the filter. If the filters and validators can be added to Form as listeners, this problem would not exist. But what can I do in the current situation?
Comment
User: @bakura10
Created On: 2013-09-05T22:13:49Z
Updated At: 2013-09-05T22:13:49Z
Body
In fact, I was a bit vague. Filters does not "upload", they rename. Eventually, they could rename and move to something else, which would upload the file to another server. But if you think at it, "moving" or "renaming" can be the job of a filter.
Originally posted by @GeeH at zendframework/zend-inputfilter#111