Giter Site home page Giter Site logo

anrgenstar / genstar Goto Github PK

View Code? Open in Web Editor NEW
20.0 11.0 2.0 3.78 MB

Generation of Synthetic Populations Library

Java 98.93% Shell 0.04% Rich Text Format 1.03%
synthetic-population-library java-8 datascience synthetic-data microsimulation multiagent statistics statistics-library statistics-toolbox demographics

genstar's Introduction

Build Status Language

Genstar

Generation of Connected and Spatialized Synthetic Populations

  • Core abstractions are define in Core module: IPopulation, IEntity, IAttribute and IValue, together with their direct implemented abtractions
  • Population generation algorithm could be find in Gospl module (generation of synthetic population library)
  • Spatialization algorithm could be find in Spll module (Synthetic population localisation library)
  • Network algorithm could be find in Spin module (Synthetic population interaction network)

Maven setup

  • just clone the repository
  • import as Maven project in your favorite IDE

That's done, all dependencies are handle by maven

  • If you want to add dependencies, please insert the corresponding Maven import into the pom.xml of current project you are working on (main pom if it concerns all project)

Maven compile and deploy

  • Run mvn clean compile into root folder to build the different Gen* jar (see in /target folder)
  • If you have a settings.xml correctly configured with bintray credentials, you can deploy the Gen* jar on Bintray using mvn deploy command

genstar's People

Contributors

alexisdrogoul avatar benoitgaudou avatar chapuisk avatar dependabot[bot] avatar genstar-lib-bot avatar julienperret avatar ptaillandier avatar reyman avatar samthiriot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

genstar's Issues

Import the INSEE dictionnaries format

Instead of reading only the genstar XML format, add the support to also parse the INSEE description of variables, of the possible values they can take, and of their mapping

IPF on Rouen: failure of generation

After removing the System.exit() commands from the test in templates rouen.gospl , I have the following problem:


19:10:58.840 [main] ERROR gospl.algo.ipf.margin.MarginalsIPFBuilder - Detailed marginals: gospl.algo.ipf.margin.ComplexMargin 
 [015 (agerevq), Elèves, étudiants, stagiaires non rémunéré de 14 ans ou plus (tact), Chômeurs (tact), Femmes ou hommes au foyer (tact), Autres inactifs (tact)] = 0.07035885954062221
[025 (agerevq), 030 (agerevq), Actifs ayant un emploi, y compris sous apprentissage ou en stage rémunéré (tact), 035 (agerevq)] = 0.05975120950695956
[080 (agerevq)] = 0.02713938587135216
[Retraités ou préretraités (tact), 020 (agerevq)] = 0.0
[060 (agerevq)] = 0.06066705694603719
[060 (agerevq), Elèves, étudiants, stagiaires non rémunéré de 14 ans ou plus (tact), Chômeurs (tact), Femmes ou hommes au foyer (tact), 055 (agerevq), Autres inactifs (tact)] = 0.017594607712661855
[090 (agerevq), 080 (agerevq), 070 (agerevq), Retraités ou préretraités (tact), 115 (agerevq), 105 (agerevq), 095 (agerevq), 085 (agerevq), 075 (agerevq), 065 (agerevq), 120 (agerevq), 110 (agerevq), 100 (agerevq)] = 0.18834645915859025
[025 (agerevq), 030 (agerevq), Elèves, étudiants, stagiaires non rémunéré de 14 ans ou plus (tact), Chômeurs (tact), Femmes ou hommes au foyer (tact), Autres inactifs (tact), 035 (agerevq)] = 0.02191986722823279
[015 (agerevq), Actifs ayant un emploi, y compris sous apprentissage ou en stage rémunéré (tact)] = 2.8965538837307615E-4
[025 (agerevq)] = 0.06714836764316703
[090 (agerevq), 080 (agerevq), 070 (agerevq), Elèves, étudiants, stagiaires non rémunéré de 14 ans ou plus (tact), Chômeurs (tact), Femmes ou hommes au foyer (tact), Autres inactifs (tact), 115 (agerevq), 105 (agerevq), 095 (agerevq), 085 (agerevq), 075 (agerevq), 065 (agerevq), 120 (agerevq), 110 (agerevq), 100 (agerevq)] = 0.005671765645305234
[005 (agerevq)] = 0.05945291386320441
[050 (agerevq), 040 (agerevq), Retraités ou préretraités (tact), 045 (agerevq)] = 0.0020784731246770733
[040 (agerevq)] = 0.06565514807894336
[085 (agerevq)] = 0.01714074917600263
[020 (agerevq)] = 0.07516780676359495
[065 (agerevq)] = 0.04267303944762103
[000 (agerevq)] = 0.06118992437668909
[045 (agerevq)] = 0.06625179434950319
[060 (agerevq), Actifs ayant un emploi, y compris sous apprentissage ou en stage rémunéré (tact), 055 (agerevq)] = 0.004661886048004509
[115 (agerevq), 105 (agerevq), 120 (agerevq), 110 (agerevq), 100 (agerevq)] = 2.9030369615949057E-4
[025 (agerevq), 030 (agerevq), Retraités ou préretraités (tact), 035 (agerevq)] = 4.892827506301962E-5
[090 (agerevq)] = 0.0066016022839362615
[Actifs ayant un emploi, y compris sous apprentissage ou en stage rémunéré (tact), 020 (agerevq)] = 7.163099469226072E-4
[070 (agerevq)] = 0.034475568778719615
[090 (agerevq), 080 (agerevq), 070 (agerevq), Actifs ayant un emploi, y compris sous apprentissage ou en stage rémunéré (tact), 115 (agerevq), 105 (agerevq), 095 (agerevq), 085 (agerevq), 075 (agerevq), 065 (agerevq), 120 (agerevq), 110 (agerevq), 100 (agerevq)] = 4.990684056428E-4
[015 (agerevq), Retraités ou préretraités (tact)] = 0.0
[Elèves, étudiants, stagiaires non rémunéré de 14 ans ou plus (tact), 020 (agerevq), Chômeurs (tact), Femmes ou hommes au foyer (tact), Autres inactifs (tact)] = 0.03439070597629523
[050 (agerevq), 040 (agerevq), Elèves, étudiants, stagiaires non rémunéré de 14 ans ou plus (tact), Chômeurs (tact), Femmes ou hommes au foyer (tact), Autres inactifs (tact), 045 (agerevq)] = 0.023415115314158668
[015 (agerevq)] = 0.06765680008340216
[050 (agerevq)] = 0.06746754132016006
[095 (agerevq)] = 0.0014948234520477637
[030 (agerevq)] = 0.06102953559428054
[075 (agerevq)] = 0.03286847317898586
[050 (agerevq), 040 (agerevq), Actifs ayant un emploi, y compris sous apprentissage ou en stage rémunéré (tact), 045 (agerevq)] = 0.011208489251436533
[010 (agerevq)] = 0.06004154069464382
[055 (agerevq)] = 0.06444581665958282
[035 (agerevq)] = 0.06114180774196653
[060 (agerevq), Retraités ou préretraités (tact), 055 (agerevq)] = 0.06700433700230159
Exception in thread "main" java.lang.RuntimeException: wrong total: Detailed marginals: gospl.algo.ipf.margin.ComplexMargin 
 [015 (agerevq), Elèves, étudiants, stagiaires non rémunéré de 14 ans ou plus (tact), Chômeurs (tact), Femmes ou hommes au foyer (tact), Autres inactifs (tact)] = 0.07035885954062221
[025 (agerevq), 030 (agerevq), Actifs ayant un emploi, y compris sous apprentissage ou en stage rémunéré (tact), 035 (agerevq)] = 0.05975120950695956
[080 (agerevq)] = 0.02713938587135216
[Retraités ou préretraités (tact), 020 (agerevq)] = 0.0
[060 (agerevq)] = 0.06066705694603719
[060 (agerevq), Elèves, étudiants, stagiaires non rémunéré de 14 ans ou plus (tact), Chômeurs (tact), Femmes ou hommes au foyer (tact), 055 (agerevq), Autres inactifs (tact)] = 0.017594607712661855
[090 (agerevq), 080 (agerevq), 070 (agerevq), Retraités ou préretraités (tact), 115 (agerevq), 105 (agerevq), 095 (agerevq), 085 (agerevq), 075 (agerevq), 065 (agerevq), 120 (agerevq), 110 (agerevq), 100 (agerevq)] = 0.18834645915859025
[025 (agerevq), 030 (agerevq), Elèves, étudiants, stagiaires non rémunéré de 14 ans ou plus (tact), Chômeurs (tact), Femmes ou hommes au foyer (tact), Autres inactifs (tact), 035 (agerevq)] = 0.02191986722823279
[015 (agerevq), Actifs ayant un emploi, y compris sous apprentissage ou en stage rémunéré (tact)] = 2.8965538837307615E-4
[025 (agerevq)] = 0.06714836764316703
[090 (agerevq), 080 (agerevq), 070 (agerevq), Elèves, étudiants, stagiaires non rémunéré de 14 ans ou plus (tact), Chômeurs (tact), Femmes ou hommes au foyer (tact), Autres inactifs (tact), 115 (agerevq), 105 (agerevq), 095 (agerevq), 085 (agerevq), 075 (agerevq), 065 (agerevq), 120 (agerevq), 110 (agerevq), 100 (agerevq)] = 0.005671765645305234
[005 (agerevq)] = 0.05945291386320441
[050 (agerevq), 040 (agerevq), Retraités ou préretraités (tact), 045 (agerevq)] = 0.0020784731246770733
[040 (agerevq)] = 0.06565514807894336
[085 (agerevq)] = 0.01714074917600263
[020 (agerevq)] = 0.07516780676359495
[065 (agerevq)] = 0.04267303944762103
[000 (agerevq)] = 0.06118992437668909
[045 (agerevq)] = 0.06625179434950319
[060 (agerevq), Actifs ayant un emploi, y compris sous apprentissage ou en stage rémunéré (tact), 055 (agerevq)] = 0.004661886048004509
[115 (agerevq), 105 (agerevq), 120 (agerevq), 110 (agerevq), 100 (agerevq)] = 2.9030369615949057E-4
[025 (agerevq), 030 (agerevq), Retraités ou préretraités (tact), 035 (agerevq)] = 4.892827506301962E-5
[090 (agerevq)] = 0.0066016022839362615
[Actifs ayant un emploi, y compris sous apprentissage ou en stage rémunéré (tact), 020 (agerevq)] = 7.163099469226072E-4
[070 (agerevq)] = 0.034475568778719615
[090 (agerevq), 080 (agerevq), 070 (agerevq), Actifs ayant un emploi, y compris sous apprentissage ou en stage rémunéré (tact), 115 (agerevq), 105 (agerevq), 095 (agerevq), 085 (agerevq), 075 (agerevq), 065 (agerevq), 120 (agerevq), 110 (agerevq), 100 (agerevq)] = 4.990684056428E-4
[015 (agerevq), Retraités ou préretraités (tact)] = 0.0
[Elèves, étudiants, stagiaires non rémunéré de 14 ans ou plus (tact), 020 (agerevq), Chômeurs (tact), Femmes ou hommes au foyer (tact), Autres inactifs (tact)] = 0.03439070597629523
[050 (agerevq), 040 (agerevq), Elèves, étudiants, stagiaires non rémunéré de 14 ans ou plus (tact), Chômeurs (tact), Femmes ou hommes au foyer (tact), Autres inactifs (tact), 045 (agerevq)] = 0.023415115314158668
[015 (agerevq)] = 0.06765680008340216
[050 (agerevq)] = 0.06746754132016006
[095 (agerevq)] = 0.0014948234520477637
[030 (agerevq)] = 0.06102953559428054
[075 (agerevq)] = 0.03286847317898586
[050 (agerevq), 040 (agerevq), Actifs ayant un emploi, y compris sous apprentissage ou en stage rémunéré (tact), 045 (agerevq)] = 0.011208489251436533
[010 (agerevq)] = 0.06004154069464382
[055 (agerevq)] = 0.06444581665958282
[035 (agerevq)] = 0.06114180774196653
[060 (agerevq), Retraités ou préretraités (tact), 055 (agerevq)] = 0.06700433700230159
	at gospl.algo.ipf.margin.MarginalsIPFBuilder.buildCompliantMarginals(MarginalsIPFBuilder.java:122)
	at gospl.algo.ipf.AGosplIPF.process(AGosplIPF.java:190)
	at gospl.algo.ipf.DistributionInferenceIPFAlgo.process(DistributionInferenceIPFAlgo.java:43)
	at gospl.algo.ipf.DistributionInferenceIPFAlgo.inferSRSampler(DistributionInferenceIPFAlgo.java:34)
	at gospl.algo.ipf.DistributionInferenceIPFAlgo.inferSRSampler(DistributionInferenceIPFAlgo.java:16)
	at rouen.gospl.IPF.main(IPF.java:104)

provide an easier way to read a configuration file

It is now (after some refactoring already - before the expected class was expected !:


new GenstarJsonUtil().unmarchalConfigurationFileFromGenstarJson(
					 Paths.get("src/test/resources/rouen_demographics/rouen_demographics.gns"));

we might expect instead a GenstarConfigurationFile.loadFromJson(...) which would not require the user to discover the concept of a class, create an instance, and "unmarshall"

Create a GenstarFileConfiguration without any call to xStream and poi-ooxml

Configuration file only works for gospl right now: first thing, this must also take into account spll configuration (e.g. input files, regression objectif attribute name, key binding between pop and location); furthermore, the configuration file is a serialized version of a class: move to a template xml design for genstar with specific schema

test IPF on rouen: "no value present"

When running in "templates" rouen.gospl.IPF:

java.util.NoSuchElementException: No value present
	at java.util.Optional.get(Optional.java:135)
	at gospl.distribution.GosplDistributionBuilder.getSample(GosplDistributionBuilder.java:364)
	at gospl.distribution.GosplDistributionBuilder.buildSamples(GosplDistributionBuilder.java:121)
	at rouen.gospl.IPF.main(IPF.java:66)
Exception in thread "main" java.util.NoSuchElementException
	at java.util.HashMap$HashIterator.nextNode(HashMap.java:1439)
	at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
	at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1042)
	at rouen.gospl.IPF.main(IPF.java:86)

Record attributes are not recognized anymore !

Because attribute are parse in data file using the list of value, record attributes are not recognize. In fact, they have only one value (a record one) which is equal to attribute name. Unfortunately, it is not encoded as an IValue; this is why it is not recognize in data file.

NDimensionalMatrix: how to find conditional distributions ?

For hierarchical sampling, we progressively define the modalities (~values) for attributes based on the modalities and values already defined during the process.

This is done with a code like this one:
`
// for each of the aspects of this attribute we're working on...
List distribution = new ArrayList<>(att.getValues().size()+1);
ArrayList a = new ArrayList<>(att2value.values());
for (AGenstarValue val : att.getValues()) {
// we want the probabilities conditioned to all the previously defined values
List aa = (List) a.clone(); // quicker than recreating a list from scratch.
// ... and for this specific val
aa.add(val);
// TODO sometimes I've here a NUllpointerexception when one of the values if empty (typically Age3)
try {
logger.debug("\t\tfor aspects: {}, getVal returns {}", aa, this.segmentedMatrix.getVal(aa));
distribution.add(this.segmentedMatrix.getVal(aa).getValue());
} catch (NullPointerException e) {
logger.warn("\t\tpotential value {} will be excluded from the distribution as it has no probability", val);
}

}
`
So for each attribute to evaluate, we call getVal(<set of modalities already identified beforehand + one specific modality of interest>). This probability is logged in lines starting with "for aspects". Here is an example of log:

18:39:22.057 [main] INFO gospl.algo.sampler.GosplHierarchicalSampler - should pick one of the values [Ouvriers (CSP), Agriculteurs exploitants (CSP), Professions intermédiaires (CSP), Artisans. commerçants. chefs d'entreprise (CSP), Retraités (CSP), Autres personnes sans activité professionnelle (CSP), Cadres et professions intellectuelles supérieures (CSP), Employés (CSP)]
18:39:22.058 [main] DEBUG gospl.algo.sampler.GosplHierarchicalSampler - for aspects: [15 à 19 ans (Age), Vivant en couple (Couple), 15 à 19 ans (Age_2), Femmes (Sexe), Ouvriers (CSP)], getVal returns 2.3388007538578066E-4
18:39:22.059 [main] DEBUG gospl.algo.sampler.GosplHierarchicalSampler - for aspects: [15 à 19 ans (Age), Vivant en couple (Couple), 15 à 19 ans (Age_2), Femmes (Sexe), Agriculteurs exploitants (CSP)], getVal returns 4.753916116503556E-6
18:39:22.061 [main] DEBUG gospl.algo.sampler.GosplHierarchicalSampler - for aspects: [15 à 19 ans (Age), Vivant en couple (Couple), 15 à 19 ans (Age_2), Femmes (Sexe), Professions intermédiaires (CSP)], getVal returns 2.5519644960413334E-4
18:39:22.062 [main] DEBUG gospl.algo.sampler.GosplHierarchicalSampler - for aspects: [15 à 19 ans (Age), Vivant en couple (Couple), 15 à 19 ans (Age_2), Femmes (Sexe), Artisans. commerçants. chefs d'entreprise (CSP)], getVal returns 4.207248914543002E-5
18:39:22.065 [main] DEBUG gospl.algo.sampler.GosplHierarchicalSampler - for aspects: [15 à 19 ans (Age), Vivant en couple (Couple), 15 à 19 ans (Age_2), Femmes (Sexe), Retraités (CSP)], getVal returns 4.361369946799801E-4
18:39:22.067 [main] DEBUG gospl.algo.sampler.GosplHierarchicalSampler - for aspects: [15 à 19 ans (Age), Vivant en couple (Couple), 15 à 19 ans (Age_2), Femmes (Sexe), Autres personnes sans activité professionnelle (CSP)], getVal returns 2.936355412156109E-4
18:39:22.069 [main] DEBUG gospl.algo.sampler.GosplHierarchicalSampler - for aspects: [15 à 19 ans (Age), Vivant en couple (Couple), 15 à 19 ans (Age_2), Femmes (Sexe), Cadres et professions intellectuelles supérieures (CSP)], getVal returns 1.4016427713094167E-4
18:39:22.071 [main] DEBUG gospl.algo.sampler.GosplHierarchicalSampler - for aspects: [15 à 19 ans (Age), Vivant en couple (Couple), 15 à 19 ans (Age_2), Femmes (Sexe), Employés (CSP)], getVal returns 2.8803957859032634E-4
18:39:22.072 [main] INFO gospl.algo.sampler.GosplHierarchicalSampler - picked CSP (String) - 8 values = Retraités (CSP)

We can observe here that the probability of having CSP = Retraité is nearly as big as others for this woman who is 15-19 years old.

This very example can be reproduced by running the testcase test/gospl.algo.sampler.TestHierarchicalSampler .

Independant sampling on Rouen fails

When starting in "template" the case Rouen.gospl.IS.java, an error happens:

Exception in thread "main" java.lang.NullPointerException: Aspect collection [100 ans ou plus (Age)] of size 1 is absent from this matrix (size = 14 - attribute = [Age_2 (Integer) - 7 values, Couple (String) - 2 values])
	at gospl.distribution.matrix.AFullNDimensionalMatrix.getVal(AFullNDimensionalMatrix.java:277)
	at gospl.distribution.matrix.AFullNDimensionalMatrix.getVal(AFullNDimensionalMatrix.java:260)
	at gospl.distribution.GosplConditionalDistribution.getVal(GosplConditionalDistribution.java:118)
	at gospl.distribution.matrix.ASegmentedNDimensionalMatrix.getVal(ASegmentedNDimensionalMatrix.java:247)
	at gospl.algo.is.IndependantHypothesisAlgo.inferSRSampler(IndependantHypothesisAlgo.java:114)
	at gospl.algo.is.IndependantHypothesisAlgo.inferSRSampler(IndependantHypothesisAlgo.java:1)
	at rouen.gospl.IS.main(IS.java:91)

metamodel for hierarchical/structured populations

to be able to create households made of individuals, or households in dwellings and dwellings in buildings, we would need a model of entities that can be linked with each other with different types of links (directed or not, agregation or association, etc.)

ConcurrentModificationException are thrown once on a while when parallel access to RasterFile#getPixel(int x, int y) method

There is two reason:

  1. AGeoValue are conceptualize as a limited set of values according to an AGeoAttribute (i.e. a band in a raster file). Hence, when we access to a new value we must store it as a possible value of the corresponding attribute.

  2. Genstar's Pixel of raster file are not stored in memory, but encapsulate, as it is called, geotools output of raster file

So, when we access concurrently to raster's pixels, it may modify the set of encountered value at the same time. It is the case in the SPLAreaMapperBuilder#buildOutput(...) method that write down spatial interpolation to an output raster

can we still encode Range values in samples?

When we were creating Range attributes before the huge refactoring, it was possible to give both a list of codes (like "1","2"...) and the corresponding textual counterparts ("less than 10m","11 to 16"...).
Now we are only constructing these ranges with the textual version.
This works well to read aggregate stats from CSV files where we expect all the columns to explicitly contain "less than 10m"; but for sample files, the values is often encoded as "1","2"...

is it still possible to deal with that?

Tks !

Change referent attribute to clearly denote a hierarchical relationship

The referent attribute expresses an attribute one is binding to. The binding actually concernes values, that is values of the attribute refers to values of the referent attribute. The relationship between values can be of one to one but also a set to one, one to a set and even set to set.
Make the relationship only concerne relationship of type: one to one or a set to one. Then create a new type of attribute, e.g. CorrelatedAttribute which should allowed all type of relationship. This type of binding thus would unable several estimation algorithm (because it would imply to estimate the relationship between sets of attributes without further knowledge)

Export populations with coding

Today populations can only be exported in CSV with labels (like gender:"male"); add the possibility to reuse the coding of the variable to export like: gender:1

Problem with population generation (reading the Configuration file ?)

Hi,

The testLocation main (sell/example/TestLocation.java) does not work anymore. The problem comes from the population generation. The problem seems to come from the reading of the configuration file.

Log:
Exception in thread "main" com.thoughtworks.xstream.converters.ConversionException: Failed calling method
---- Debugging information ----
message : Failed calling method
cause-exception : java.lang.NullPointerException
cause-message : null
method : core.io.configuration.GosplConfigurationFile.readResolve()
class : core.io.configuration.GosplConfigurationFile
required-type : core.io.configuration.GosplConfigurationFile
converter-type : com.thoughtworks.xstream.converters.reflection.ReflectionConverter
path : /GosplConfiguration
version : 1.4.9

at com.thoughtworks.xstream.core.util.SerializationMembers.callReadResolve(SerializationMembers.java:82)
at com.thoughtworks.xstream.converters.reflection.AbstractReflectionConverter.unmarshal(AbstractReflectionConverter.java:282)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convert(TreeUnmarshaller.java:72)
at com.thoughtworks.xstream.core.AbstractReferenceUnmarshaller.convert(AbstractReferenceUnmarshaller.java:70)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:66)
at com.thoughtworks.xstream.core.TreeUnmarshaller.convertAnother(TreeUnmarshaller.java:50)
at com.thoughtworks.xstream.core.TreeUnmarshaller.start(TreeUnmarshaller.java:134)
at com.thoughtworks.xstream.core.AbstractTreeMarshallingStrategy.unmarshal(AbstractTreeMarshallingStrategy.java:32)
at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1230)
at com.thoughtworks.xstream.XStream.unmarshal(XStream.java:1214)
at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1178)
at com.thoughtworks.xstream.XStream.fromXML(XStream.java:1120)
at core.io.configuration.GosplXmlSerializer.deserializeGSConfig(GosplXmlSerializer.java:104)
at core.io.configuration.GosplXmlSerializer.deserializeGSConfig(GosplXmlSerializer.java:116)
at gospl.distribution.GosplDistributionFactory.<init>(GosplDistributionFactory.java:63)
at spll.example.TestLocalisation.generatePopulation(TestLocalisation.java:260)
at spll.example.TestLocalisation.main(TestLocalisation.java:105)

Caused by: java.lang.NullPointerException
at java.util.HashMap.putMapEntries(HashMap.java:500)
at java.util.HashMap.putAll(HashMap.java:784)
at core.io.configuration.GosplConfigurationFile.(GosplConfigurationFile.java:42)
at core.io.configuration.GosplConfigurationFile.readResolve(GosplConfigurationFile.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.thoughtworks.xstream.core.util.SerializationMembers.callReadResolve(SerializationMembers.java:78)
... 16 more

IPF marginal builder gives inconsistent marginals

Several things to consider:

  1. marginals could be expressed using equivalent but not the exact same attribute as in seed. In other words, input control table could have attribute that are only linked (mapped) to sample attribute. For ex. a control table with AGE x GENDER could have 'range age' while sample have AGE attribute as an 'integer age'.

  2. marginals could be build from several sources. In other words, marginal value could be scattered into several control tables. For ex. AGE x GENDER x OCCUPATION could be inferred from two control table AGE x GENDER and AGE x OCCUPATION.

  3. marginals could be build from diverging data. In other words, marginal may not concerne the entier spectrum of data value. For ex. AGE x OCCUPATION could describe marginal only for age above 15 years oldn while AGE x GENDER could have full relationship between all age value

Hence, all in one, marginal builder have not been able to determine marginal that are all equal to each other (3) and not able to collect and harmonize scattered data (2)

be able to add some attributes to a preexisting population

Use case 1: augment an existing dataset

  • a GoSPL population is already available from data (like a list of buildings)
  • we want to add two other attributes which depend of existing ones, based on contigency tables

Use case 2: chain different GoSPL tools

  • use IPF to reweight an available dataset, thus creating a synthetic population A
  • add some other attributes which are not in the sample, for which we have frequency tables

NDimensional matrices: how to manipulate conditional probabilities ?

Hey !
Working in the direction of encoding Bayesian inference based on this structure of data.
How should I use these matricies to encode condiitonal probabilities ? Should I use GosplConditionalDistribution or GosplJointDistribution ? Is there a method to know which column is the reference column, that is X in p(X|Y,Z) ?
Tks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.