khiopsml / khiops Goto Github PK
View Code? Open in Web Editor NEWKhiops is an AutoML suite for supervised and unsupervised learning
Home Page: https://khiops.org
License: BSD 3-Clause Clear License
Khiops is an AutoML suite for supervised and unsupervised learning
Home Page: https://khiops.org
License: BSD 3-Clause Clear License
L'utilisation de KNI depuis Java, qui marchait avec Khiops V9, crash avec Khiops V10.
Comportement observé uniquement sous linux
Piste: gestion des signaux, mis en place depuis Khiops V10
In the crash tests scenari, we have to write the full task signature e.g. Database check_9_8_15
. The signature is built starting from the number of shared variables. When the code of the task is modified, the signature may change and the scenario of the crash test is obsolete. We have to simplify this by using the task name rather than it signature.
J'ai essayé d'activer la compilation des fichiers Lex et Yac sous Windows
Cela ne marche pas.
__UNIX__
pragmas.A first proposal is to add LearningTest
in the test
directory with:
├── cmd
│ ├── python
├── datasets
│ ├── Adult
│ ├── Iris
│ └── Mushroom
├── MTdatasets
│ └── SpliceJunction
├── TestCoclustering
│ ├── Standard
├── TestKhiops
└── Standard
I'm not sure it is necessary to add all these data. I was using LearningTest
to test packages. For this purpose Iris.txt
is good enough. We have to decide what is the purpose of LearningTest in CI
Actuellement, l'option "ColumnLimit: 120" du fichier .clang-format rend le reformatage des fichiers instables et peu pertinent:
Il faudrait:
Il faut également dans le wiki indiquer comment activer le reformatage automatique dans les IDE (notamment Visual C++)
In the KNI unit tests, there are:
But nothing to test side effect on MultiTable. I suggest to implement a unit test for Multi Table by using directly the KNI API (and not src/Learning/KNITransfer/KNIRecodeMTFiles.h
, it too high level)
There are rc files for MODL
, MODL_Coclustering
and KhiopsNativeInterface
. For the moment these resource files are not used by cmake. It should be easy to do it by adding them as source file in add_executable
or add_library
.
Avant le séminaire Khiops, Luc-Aurélien a créé les repos suivant (cf. mail Séminaire Khiops 2023 du 25/05/2023)
Au moment du passage officiel a la version open source, il faudra nettoyer et harmoniser les git:
La mise en oeuvre de genere dans les CMakeFiles (issue #6 ) necessite que genere ait un code retour. Cela permettra d'interrompre la compilation en cas de problème avec la génération des fichiers cpp de la GUI à partir des .dd.
We can setup mpi run in the CMakeLists like this:
add_test(NAME parallel-mpi COMMAND ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG} ${MPIEXEC_MAX_NUMPROCS}
${MPIEXEC_PREFLAGS} $<TARGET_FILE:parallel_mpi_test> ${MPIEXEC_POSTFLAGS})
# Add "mpi" label to the test and set a default process number to launch.
set_tests_properties(parallel-mpi PROPERTIES LABELS "mpi" PROCESSORS ${MPIEXEC_MAX_NUMPROCS})
The parallel run is launch with:
# run all test labeled by `mpi
ctest --preset linux-gcc-release -L mpi --output-junit output-mpi.xml
All the other tests can be launched with one line:
ctest --preset linux-gcc-release -LE mpi --output-junit output-serial.xml
It is interesting because it is independent of the mpi implementation (mpich, openmpi etc...) and use the number of processors detected on the host.
Note: it's tempting to use the --parallel
flag to execute serial tests concurrently but Khiops is not thread-safe
Using Visual C++ 2022, I have the following error message
[CMake] -- The CXX compiler identification is unknown
[CMake] CMake Error at D:\Users\miib6422\Documents\boullema\DevGit\khiops\CMakeLists.txt:10 (project):
[CMake] The CMAKE_CXX_COMPILER:
[CMake]
[CMake] cl
[CMake]
[CMake] is not a full path and was not found in the PATH.
[CMake]
[CMake] Tell CMake where to find the compiler by setting either the environment
[CMake] variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
It worked before on my PC.
It is broken since I had a patch of Windows and the following update of Visual C++ 2022:
Microsoft Visual Studio Professional 2022
Version 17.6.5
VisualStudio.17.Release/17.6.5+33829.357
Microsoft .NET Framework
Version 4.8.04084
Version installée : Professional
...
We need to add an action to run LearningTest Standard to the CI/CD.
We can run the tests in serial or parallel and in release or debug.
Debug will take a long time, perhaps we split it in 2 different actions, one for release, one for debug.
On Visual Studio 2022, setting jobs to 0 is an issue
The windows variant of the unit test workflow fails because of a path error:
Output of Learning
unit tests
Run build/windows-msvc-debug/bin/learning_test --gtest_output="xml:D:\a\khiops\khiops/reports/report-learning.xml"
Running main() from D:\a\khiops\khiops\build\windows-msvc-debug\_deps\googletest-src\googletest\src\gtest_main.cc
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from KWClass
[ RUN ] KWClass.full
[ OK ] KWClass.full (10 ms)
[----------] 1 test from KWClass (10 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (11 ms total)
[ PASSED ] 1 test.
NewMem alloc block warning id=3400 000001ED033172E8 (120): Block not free
Memory stats (number of pointers, and memory space)
Alloc: 6538 Free: 6537 MaxAlloc: 1380
Requested: 129010 Granted: 166368 Free: 166248 MaxGranted: 55248
NewMem Warning: Block not free: 1 Memory not free: 120
Output of Create unit test report
Run dorny/test-reporter@v1
Check runs will be created with SHA=0c46fadae637321e97df169a540246a4eaff6f9d
Listing all files tracked by git
Found 1390 files tracked by GitHub
Using test report parser 'jest-junit'
Creating test report Unit Tests
Warning: No file matches path D:\a\khiops\khiops/reports/report-*.xml
Error: No test report files were found
There are still remaining if(FULL)
statements in KNITransfer
and Parallel
Update the .cmake-format.py file
Update README for the beta release.
The dev-v11
branch doesnt build in macOS. This is due to a bad signature of the OpenApplication
function in Portability.h
This code is obsolete since Khiops 10.1, we should remove it.
the new repo KhiopsNativeInterfaceTutorial will replace the documentation and samples included in the kni packages. packages. Therefore, the kni-doc packages and all data to test kni packages will disappear from the repo.
The tests of the kni packages will use a clone of KhiopsNativeInterfaceTutorial inside cicd.
Pour la version open source, les fichiers de documentation (guide...) et de samples vont changer de localisation pour être hébergé sur un site web (à définir).
Il faut mettre à jour les fichiers de packaging et les panneau d'aide suivant:
A faire pour la version Open source (V10.1.*) et à reporter pour la version 11
We already have a Conda macOS package compatible with the ARM64 architecture. However, we currently lack a Linux ARM64 version of Khiops-core. This is important for Mac users who prefer running Khiops in a Docker container (to be based on a Linux-ARM64 image). It would target Raspberry or other ARM platforms as well.
I conducted a performance benchmark comparing the native Conda installation of Khiops on macOS with an emulated Docker Linux-AMD64 image. The results are as follows:
These results show that running Khiops on a Linux-ARM64 architecture container might significantly improve performance.
Les performances se sont fortement dégradées depuis le passage à Visual C++ 2022, d'un facteur 3 à 4.
Le paramétrage des options de compilation via CMake doit être optimisé.
Exemples sur ma machine avec la version V10.4.6i:
Also fixes #28
We can specify the targets to be build:
cmake --build . --target target1 target2
It will speed up the build process (especially for unit test)
Corrections to repo:
Refactor coclustering instances x variables: simplify code, especially for classes:
Il s'agit de gérer la production des packages et installeurs fabriqués depuis github, leur stockage, et leur distribution via des repos ou le site web de download Khiops
Note pour l'échéance du datacamp (mi-octobre)
See here for context
Without this the .rc
files do not compile.
When I work on workflows, each time I push my work, unit tests are triggered. And I am 100% sure they will pass because I don't modify the sources or the cmake files.
I suggest to trigger the unit tests based on which files are involved in the push. Some documentation here
For example:
on:
push:
paths:
- '**.cpp'
- '*.h'
- '**/CMakeLists.txt'
In the deb and rpm packages produced on github actions, the pdf files are shrinked:
ls -gG usr/share/doc/khiops/
-rw-r--r-- 1 131 juin 19 16:51 KhiopsCoclusteringGuide.pdf
-rw-r--r-- 1 132 juin 19 16:51 KhiopsGuide.pdf
-rw-r--r-- 1 132 juin 19 16:51 KhiopsTutorial.pdf
When they are produced on my computer, we have
ls -gG usr/share/doc/khiops/
-rw-r--r-- 1 854509 juin 19 09:58 KhiopsCoclusteringGuide.pdf
-rw-r--r-- 1 1736158 juin 19 09:58 KhiopsGuide.pdf
-rw-r--r-- 1 4115198 juin 19 09:58 KhiopsTutorial.pdf
Contexte: Khiops V11, autoML pour les variables de type Text (nouveau type)
Dans le cadre de la construction de variable automatique pour les variables Text, une première phase d'analyse de la base consiste à collecter les tokens ("ngrams", "words","tokens") les plus fréquents pour créer des blocs de variables sparse.
Cette phase est actuellement implémentée en séquentiel. Il s'agit de passer l'implémentation en parallèle.
Point d'entrée:
How to manage the risk of having the git repo on internet, if for example a malicious user cracks admin paswords and delete the depo.
This is not recommended by the CMake project: See the note in https://cmake.org/cmake/help/latest/command/file.html#filesystem
Il s'agit d'améliorer les algorithme d'optimisation du coclustering IV, en visant un code plus simple et maintenable, plus rapide, et plus performant.
Les axes de reconception du code pour atteindre ces objectifs sont principalement les suivants:
Il s'agit de rapatrier dans git l'ensemble des développement en cours, depuis la version V10.1.* diffusée jusqu'à la version internes V10.4.1i, principalement pour les fonctionnalités suivantes: histogrammes, texte, nouvelle ergonomie.
Cette première version V11 sur git correspond à la prise en compte du portage Mac (prise en compte dans la V10.1.*), juste avant le début de l'intégration des développement de Carine sur le coclustering instances x variables.
Bug sur le lancement de stats descriptives avec la très grande base d'Elias (environ 400 GB disk, 300 M instances, 80 variables)
Première correction: diagnostiquée par Bruno, mauvais paramétrage de pykhiops (issue reportée dans pykhiops)
Malgré cette correction, le problème persiste:
The windows build (from the CI) of the MODL_Coclustering
executable is twice the size (~8MB) of its previous size. So we have a compilation problem.
MODL
dev
The copyright header is not already followed by an empty line
Add a json formatter for CMakePresets.json
jq seems to be a good choice : available for linux, windows and mac
After installation of version 10.1.1 of KNI from khiops.com on ubuntu jammy my app started crashing on KNIGetFullVersion
I don't really need this function, since GetVersion is there.
What makes be believe this is a bug is the fact that the header file still shows the symbol, while it disappeared from the library.
The files generall.cmd
are used to reproduce the generation of the GUI files starting from the *.dd
files. We can replace these windows scripts by new targets in cmake (like with lex & yacc or jars).
In Linux: Ninja is not necessary
In windows: To see
Il s'agit de restructurer l'implémentation des algorithmes d'optimisation du coclustering IV afin d'améliorer sa maintenabilité et son évolutivité, pour permettre dans un second temps d'améliorer les algorithmes d'optimisation.
Première étape: mutualisation des la gestion des groupe de valeurs ou de VarPart
Deuxième étape:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.