bigdata-ustc / eduktm Goto Github PK

The Model Zoo of Knowledge Tracing Models

License: Apache License 2.0

Python 98.75% Makefile 1.25%

knowledge-tracing-models kpt ekpt dynamic-cognitive-diagnosis deep-knowledge-tracing dkt dkt-plus deep-knowledge-tracing-plus akt lpkt

eduktm's People

Contributors

Stargazers

Watchers

Forkers

sone47 zhanghang95 cindywhq123 zelo2 zhangzhenhu huajiang123 ziweizh vivihong200709 ljyustc kaydencheung mingli-ai sleepyrae techxgenus fanniecream jcjajx123 chengjie11 ckq1987 dongling1988 cyuanlong acmarkdry buaanlsdewm nhippysir noresult liamdgray ccnuzfw xingyezn coder-zc 0russwest0 fannazya boxuanma deng23yu ivanchenph dumpmemory royalvane charliew02 coreyabs-db weizhehuang0827 bigdata-ustc giniyuyu savagewei famatee bbeechu skytian loopereit wangshanshanahu tong198-hu polzounov vpkklong zyy-2001 banfuzhi brianpark314 dqyzhwk xbh0720 guoyxustc jaemyoung xubihan0720 liv10let npcccccccccccccc furkantornaci maybee99 duanchao

eduktm's Issues

LBKT的junyi数据预处理请问可以开源吗谢谢

LBKT的junyi数据预处理，请问可以开源吗谢谢，我按照论文预处理方法还原的过程中总是无法达到,4316340条记录的同时，问题只有701和技能只有39

The training module can't be found,do you delete the file?

Description

(A clear and concise description of what the feature is.)

If the proposal is about a new dataset, provide description of what the dataset is and
attach the basic data analysis with it.
If the proposal is about an API, provide mock examples if possible.

References

list reference and related literature
list known implementations

Knowledge State Depiction for DKVMN

Dear Dr. Tong,

first of all, thank you so much for putting this comprehensive overview of KTMs together. Based on your survey on Knowledge Tracing and the examples and algorithms you collected here I was able to analyse my data using a DKVMN.

While the results of the RNN are very promising I would like to now generate the knowledge state depictions. Do you by any chance are aware of whether the code for the calculations and visualisations can be found somewhere?

Kind regards,
Carl Klukkert

[Bug: AKT]

🐛 Description

attention function in AKTNet.py has a padding error, which may cause data leakage.

What have you tried to solve it?

original:

solution:

Environment

(Conda) Python3.8

Additional Info

This bug won't cause any runtime error. It use students' response information in error.

[Feature] add BKT

[Feature] add DKVMN

DKT training fails for batch size of 1

🐛 Description

DKT training fails for batch size of 1, but works for larger batch sizes (i.e. 64)

Error Message

RuntimeErrorTraceback (most recent call last)
<ipython-input-15-0a528344bb33> in <module>
      5 # Initialize and train model
      6 dkt = DKT(NUM_QUESTIONS, HIDDEN_SIZE, NUM_LAYERS)
----> 7 dkt.train(train_loader, epoch=50)
      8 
      9 # Save weights

/usr/local/lib/python3.6/dist-packages/EduKTM/DKT/DKT.py in train(self, train_data, test_data, epoch, lr)
     61                 # back propagation
     62                 optimizer.zero_grad()
---> 63                 loss.backward()
     64                 optimizer.step()
     65 

/usr/local/lib/python3.6/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
    305                 create_graph=create_graph,
    306                 inputs=inputs)
--> 307         torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
    308 
    309     def register_hook(self, hook):

/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    154     Variable._execution_engine.run_backward(
    155         tensors, grad_tensors_, retain_graph, create_graph, inputs,
--> 156         allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
    157 
    158 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

To Reproduce

Run the example notebook (https://github.com/bigdata-ustc/EduKTM/blob/main/examples/DKT/DKT.ipynb) and set the BATCH_SIZE variable to 1.

What have you tried to solve it?

Increasing the batch size avoids the error.

Environment

Environment Information

Operating System: Ubuntu in docker image: tensorflow/tensorflow:2.6.0-gpu-jupyter.
Also tested in Google Colab notebook.

Python Version: Python 3.6.9

pip freeze:

absl-py==0.13.0
aiocontextvars==0.2.2
altair==4.1.0
ansi2html==1.6.0
anyio==3.3.4
argon2-cffi==20.1.0
asgiref==3.4.1
asn1crypto==0.24.0
astor==0.8.1
astunparse==1.6.3
async-generator==1.10
attrs==21.2.0
Babel==2.9.1
backcall==0.2.0
backports.zoneinfo==0.2.1
base58==2.1.0
beautifulsoup4==4.10.0
bleach==4.0.0
blinker==1.4
Brotli==1.0.9
bs4==0.0.1
cached-property==1.5.2
cachetools==4.2.2
certifi==2021.5.30
cffi==1.14.6
charset-normalizer==2.0.4
clang==5.0
click==7.1.2
contextlib2==21.6.0
contextvars==2.4
cryptography==2.1.4
cycler==0.10.0
dash==2.0.0
dash-core-components==2.0.0
dash-html-components==2.0.0
dash-table==5.0.0
dataclasses==0.8
decorator==4.4.2
defusedxml==0.7.1
EduData==0.0.18
EduKTM==0.0.9
entrypoints==0.3
et-xmlfile==1.1.0
fastapi==0.70.0
fire==0.4.0
Flask==2.0.2
Flask-Compress==1.10.1
flatbuffers==1.12
gast==0.4.0
gitdb==4.0.8
GitPython==3.1.18
google-auth==1.34.0
google-auth-oauthlib==0.4.5
google-pasta==0.2.0
grpcio==1.39.0
h11==0.12.0
h5py==3.1.0
idna==3.3
immutables==0.16
importlib-metadata==4.6.3
importlib-resources==5.3.0
ipykernel==5.5.6
ipython==7.16.1
ipython-genutils==0.2.0
ipywidgets==7.6.3
itsdangerous==2.0.1
jedi==0.18.0
Jinja2==3.0.1
joblib==1.1.0
json5==0.9.6
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.12
jupyter-console==6.4.0
jupyter-core==4.7.1
jupyter-dash==0.4.0
jupyter-http-over-ws==0.0.8
jupyter-server==1.11.1
jupyterlab==3.0.16
jupyterlab-pygments==0.1.2
jupyterlab-server==2.8.2
jupyterlab-widgets==1.0.0
keras==2.6.0
Keras-Preprocessing==1.1.2
keyring==10.6.0
keyrings.alt==3.0
kiwisolver==1.3.1
loguru==0.5.3
longling==1.3.32
lxml==4.6.3
Markdown==3.3.4
MarkupSafe==2.0.1
matplotlib==3.3.4
mistune==0.8.4
nbclassic==0.3.3
nbclient==0.5.3
nbconvert==6.0.7
nbformat==5.1.3
nest-asyncio==1.5.1
networkx==2.5.1
notebook==6.4.3
numpy==1.19.5
oauthlib==3.1.1
openpyxl==3.0.9
opt-einsum==3.3.0
opyrator==0.0.12
packaging==21.0
pandas==1.1.5
pandocfilters==1.4.3
parso==0.8.2
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.3.1
plotly==5.3.1
prometheus-client==0.11.0
prompt-toolkit==3.0.19
protobuf==3.17.3
ptyprocess==0.7.0
pyarrow==5.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pycrypto==2.6.1
pydantic==1.8.2
pydeck==0.6.2
Pygments==2.9.0
PyGObject==3.26.1
pygraphviz==1.6
pyparsing==2.4.7
pyrsistent==0.18.0
python-apt==1.6.5+ubuntu0.7
python-dateutil==2.8.2
pytz==2021.3
pytz-deprecation-shim==0.1.0.post0
pyxdg==0.25
PyYAML==6.0
pyzmq==22.2.1
qtconsole==5.1.1
QtPy==1.9.0
rarfile==4.0
requests==2.26.0
requests-oauthlib==1.3.0
requests-unixsocket==0.2.0
retrying==1.3.3
rsa==4.7.2
scikit-learn==0.24.2
scipy==1.5.4
seaborn==0.11.2
SecretStorage==2.3.1
Send2Trash==1.8.0
six==1.15.0
sklearn==0.0
smmap==5.0.0
sniffio==1.2.0
soupsieve==2.2.1
starlette==0.16.0
streamlit==1.1.0
tenacity==8.0.1
tensorboard==2.6.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.0
tensorflow==2.6.0
tensorflow-estimator==2.6.0
termcolor==1.1.0
terminado==0.10.1
testpath==0.5.0
threadpoolctl==3.0.0
toml==0.10.2
toolz==0.11.1
torch==1.10.0
tornado==6.1
tqdm==4.62.3
traitlets==4.3.3
typer==0.4.0
typing-extensions==3.7.4.3
tzdata==2021.4
tzlocal==4.0.1
urllib3==1.26.6
uvicorn==0.15.0
validators==0.18.2
watchdog==2.1.6
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.2.1
Werkzeug==2.0.1
widgetsnbextension==3.5.1
wrapt==1.12.1
zipp==3.5.0

question about dataprepare

Hello, thank you very much for making the LPKT code public, but I overfit when reproducing the assist12 dataset, can you share with me the data processing code of assist12 and ednet？

Question about Figure 1, 3 on the LPKT paper

Hello, Thanks for opening codes for this project.
I want to check the proficiency change of every knowledge during learning process as in the figure 1, 3. of your LPKT paper.
How did you get the proficiency level from 0.0 to 1.0?

[Feature] add GKT

关于LPKT的数据处理

在LPKT的assistment2017数据集的预处理中
一个problem不是可以对应多个skill吗，这样子处理的话在字典中一个problem只能对应一个skill，我没理解错的话...

Adding GKT model fails through pytest

I have made the local system the same as the online system and carefully tested the code in my local system (all testing passed). However, I encountered the exception: E AttributeError: '_io.StringIO' object has no attribute 'buffer' after raising a PR #22 for adding GKT model. To solve the problem, I downgraded flake8 4.0.1 to 3.9.2 in setup.py and got the pycodestyle error report.

Dataset

I attend to run the program, but I can't download Assist12 and Assistmentchall dataset from their official website. Would you like to share these two dataset.

[Feature] add SKT

knowledge state visualization

In LPKT code, how can the knowledge state of students be visualized? It seems that only one question can be answered at a time, so how can we obtain the mastery level of other knowledge concepts?

[!important] the test will switch from travis to github actions due to the exhaust of credits

An error in the implementation of DKT!!

🐛 Description

In DKT of EduKTM, the loss function is BCEWithLogitLoss() , but there is already a sigmoid() at the end of forward() of Net , So, shouldn’t it be BCELoss() here?
When I replace BCEWithLogitLoss() with BCELoss() , the performance has been greatly improved. ( AUC on 2009_skill_builder_data_corrected from ~0.75 to ~0.80)

This looks like a big mistake， does this mean that there are some problems with the experimental results of many papers?

Questions about `process_raw_pred` function in DKT.py

Hi Dr. Tong,

There's a function named process_raw_pred in EduKTM/EduKTM/DKT/DKT.py .

EduKTM/EduKTM/DKT/DKT.py

Lines 31 to 37 in c9912f0

    
           def process_raw_pred(raw_question_matrix, raw_pred, num_questions: int) -> tuple: 
        
               questions = torch.nonzero(raw_question_matrix)[1:, 1] % num_questions 
        
               length = questions.shape[0] 
        
               pred = raw_pred[: length] 
        
               pred = pred.gather(1, questions.view(-1, 1)).flatten() 
        
               truth = torch.nonzero(raw_question_matrix)[1:, 1] // num_questions 
        
               return pred, truth

According to the codes below (line 56), we can learn thatprocess_raw_pred is used to process the raw input and the output of the DKT model.

EduKTM/EduKTM/DKT/DKT.py

Lines 50 to 58 in c9912f0

    
           for e in range(epoch): 
        
               all_pred, all_target = torch.Tensor([]), torch.Tensor([]) 
        
               for batch in tqdm.tqdm(train_data, "Epoch %s" % e): 
        
                   integrated_pred = self.dkt_model(batch) 
        
                   batch_size = batch.shape[0] 
        
                   for student in range(batch_size): 
        
                       pred, truth = process_raw_pred(batch[student], integrated_pred[student], self.num_questions) 
        
                       all_pred = torch.cat([all_pred, pred]) 
        
                       all_target = torch.cat([all_target, truth.float()])

I have three questions.

I noticed that questions = torch.nonzero(raw_question_matrix)[1:, 1] % num_questions in line 32. [1:, 1] here we goes from index 1, which means we throw away the first answer whose index is 0. Do you mean that the first value is not predicted and it is meaningless because it depends on no history answer records?
About pred = raw_pred[: length] in line 34, here we goes from index 0. Why don't we throw away the first predicted value just like what we did in line 32? e.g. pred = raw_pred[1 : length+1]
About truth = torch.nonzero(raw_question_matrix)[1:, 1] // num_questions in line 36, we use // so that we can get 0 if non-zeros are in the first half part (correct answers), get 1 if non-zeros are in the second half part(wrong answers).
However, according to the encode_onehot function in examples/DKT/prepare_dataset.ipynb, correct answers are in the first half part and wrong answers are in the second half part. Also, 1 stands for a correct answer and 0 stands for a wrong answer.

def encode_onehot(sequences, max_step, num_questions):
    result = []

    for q, a in tqdm.tqdm(sequences, 'convert to one-hot format: '): # e.g. q: [1,2,3]  a: [1,0,0]
        length = len(q)
        # append questions' and answers' length to an integer multiple of max_step
        mod = 0 if length % max_step == 0 else (max_step - length % max_step)
        onehot = np.zeros(shape=[length + mod, 2 * num_questions])
        print(length+mod)
        for i, q_id in enumerate(q):
            # if a[i]>0(correct answer)，index=question id(first half part)，else index=question id + question number(second half part)
            index = int(q_id if a[i] > 0 else q_id + num_questions)
            onehot[i][index] = 1 # correct answers are in the first half part
        result = np.append(result, onehot)
    
    return result.reshape(-1, max_step, 2 * num_questions)

So truth = torch.nonzero(raw_question_matrix)[1:, 1] // num_questions is not consistent with the encoding. To validate my thoughts, I ran the DKT example and print the torch.nonzero(raw_question_matrix) and truth, which are used to compare with the encoding result stored in test.txt.

Here are two screenshots which i got from console and test.txt,

This shows my thought is right.

Then I added a line truth = torch.tensor([1 if i == 0 else 0 for i in truth]) to test performence.

The average AUC is about 0.73, equal to the moment before adding this line.

Sorry for the long description : ( .Your answers are appreciated!😀

Data processing of LPKT

Description

There is only code to preprocess data on ASSISTChallenge in EduKTM-LPKT.
I tried to imitate your code to process the data of ASSIST12 and Ednet, but the experimental results are far from the results in the paper.
Can other code for processing data be open sourced?

AKT Bug

🐛 Description

https://github.com/bigdata-ustc/EduKTM/blob/main/EduKTM/AKT/AKTNet.py#L273
The mask_fill operation on this line should be replaced with mask_fill_, otherwise it will cause data leakage

What have you tried to solve it?

origin
scores.masked_fill(mask == 0, -1e32)
solution
scores.masked_fill_(mask == 0, -1e32)

	def process_raw_pred(raw_question_matrix, raw_pred, num_questions: int) -> tuple:
	questions = torch.nonzero(raw_question_matrix)[1:, 1] % num_questions
	length = questions.shape[0]
	pred = raw_pred[: length]
	pred = pred.gather(1, questions.view(-1, 1)).flatten()
	truth = torch.nonzero(raw_question_matrix)[1:, 1] // num_questions
	return pred, truth

	for e in range(epoch):
	all_pred, all_target = torch.Tensor([]), torch.Tensor([])
	for batch in tqdm.tqdm(train_data, "Epoch %s" % e):
	integrated_pred = self.dkt_model(batch)
	batch_size = batch.shape[0]
	for student in range(batch_size):
	pred, truth = process_raw_pred(batch[student], integrated_pred[student], self.num_questions)
	all_pred = torch.cat([all_pred, pred])
	all_target = torch.cat([all_target, truth.float()])

bigdata-ustc / eduktm Goto Github PK

eduktm's People

Contributors

Stargazers

Watchers

Forkers

eduktm's Issues

Description

References

🐛 Description

What have you tried to solve it?

Environment

Additional Info

🐛 Description

Error Message

To Reproduce

What have you tried to solve it?

Environment

🐛 Description

Description

🐛 Description

What have you tried to solve it?

Recommend Projects

Recommend Topics

Recommend Org