Is this the same data as was used in the paper "A Survey on Behavior Recognition Using

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

data about csi-activity-recognition HOT 13 OPEN

ludlows commented on August 22, 2024

data

from csi-activity-recognition.

Comments (13)

dheeraj7092 commented on August 22, 2024 2

@ludlows I am having trouble opening the dataset . as you can see below , when i open the dataset there are no values present . Can you please help me to open the dataset and explain what are the row and coloumn values .

from csi-activity-recognition.

ludlows commented on August 22, 2024

yes. i am using the same data.

from csi-activity-recognition.

Tsardoz commented on August 22, 2024

Thanks for your reply. FYI randomising time series data into train/validation sets is invalid. I understand the original paper you based this on also made this mistake. With 800 msec overlap between records almost every validation sample has at least one record in the training set that is 80% identical. Splits should be done on a per subject basis. I am notifying the original authors and journal as well. Thank you for making this code public as hopefully this will help others in this field.

from csi-activity-recognition.

du7092 commented on August 22, 2024

the data i have downloaded is in .csv file and not having any values in them( they are showing some random values rather than amplitude or phase values)

from csi-activity-recognition.

ludlows commented on August 22, 2024

@Tsardoz thanks for sharing your suggestions with me.

from csi-activity-recognition.

ludlows commented on August 22, 2024

@du7092 as I remember, the first half part of each line in csv files is amplitude info and the rest part is phase info

from csi-activity-recognition.

joekerrXie commented on August 22, 2024

@du7092 as I remember, the first half part of each line in csv files is amplitude info and the rest part is phase info

“That means the original data was extracted in parallel from 90 antennas,” right?

from csi-activity-recognition.

joekerrXie commented on August 22, 2024

Thanks for your reply. FYI randomising time series data into train/validation sets is invalid. I understand the original paper you based this on also made this mistake. With 800 msec overlap between records almost every validation sample has at least one record in the training set that is 80% identical. Splits should be done on a per subject basis. I am notifying the original authors and journal as well. Thank you for making this code public as hopefully this will help others in this field.

When sliding the window to extract data, the window length and step length should be set the same to ensure that the extracted data does not overlap. Should data with no activity be classified separately for training to help the model understand background noise and data characteristics in a normal state?

from csi-activity-recognition.

Tsardoz commented on August 22, 2024

No. The data is still not independent even if windows do not overlap. You just cannot do this. Ideally the data should be split so that no subject (person) overlaps training/test/validation sets as the data is not independent then either. Read any text on this.

…

On Tue, Jun 11, 2024 at 7:34 PM joekerrXie ***@***.***> wrote: Thanks for your reply. FYI randomising time series data into train/validation sets is invalid. I understand the original paper you based this on also made this mistake. With 800 msec overlap between records almost every validation sample has at least one record in the training set that is 80% identical. Splits should be done on a per subject basis. I am notifying the original authors and journal as well. Thank you for making this code public as hopefully this will help others in this field. When sliding the window to extract data, the window length and step length should be set the same to ensure that the extracted data does not overlap. Should data with no activity be classified separately for training to help the model understand background noise and data characteristics in a normal state? — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACOIPDFM6BRN6EDINZP3KX3ZG3ABFAVCNFSM6AAAAABJDXZX6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQGI2TMNJSGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

from csi-activity-recognition.

Tsardoz commented on August 22, 2024

Here are three: 1. "Machine Learning for Time-Series with Python" by Ben Auffarth: This book provides a comprehensive introduction to time series analysis and emphasizes the importance of maintaining the temporal order of data to avoid breaking the independence rule, which is essential for accurate model training and evaluation <https://www.amazon.com/Machine-Learning-Time-Python-state/dp/1801819629> . 2. "A Course in Time Series Analysis" by Suhasini Subba Rao: This textbook covers various aspects of time series analysis, including the importance of preserving the sequence of data points. It explains that randomizing time series data can disrupt the inherent temporal dependencies, leading to misleading results <https://web.stat.tamu.edu/~suhasini/teaching673/time_series.pdf>. 3. "Comparing Statistical and Machine Learning Methods for Time Series Forecasting" by Ricardo P. Masini, Marcelo C. Medeiros, and Eduardo F. Mendes: This journal article discusses the application of machine learning methods to time series forecasting and highlights the necessity of keeping the data in its original order to maintain the temporal dependencies crucial for accurate predictions <https://anson.ucdavis.edu/~rmasini/files/papers/MMM-2021-JES.pdf>.

…

On Tue, Jun 11, 2024 at 7:39 PM Andrew Walsh ***@***.***> wrote: No. The data is still not independent even if windows do not overlap. You just cannot do this. Ideally the data should be split so that no subject (person) overlaps training/test/validation sets as the data is not independent then either. Read any text on this. On Tue, Jun 11, 2024 at 7:34 PM joekerrXie ***@***.***> wrote: > Thanks for your reply. FYI randomising time series data into > train/validation sets is invalid. I understand the original paper you based > this on also made this mistake. With 800 msec overlap between records > almost every validation sample has at least one record in the training set > that is 80% identical. Splits should be done on a per subject basis. I am > notifying the original authors and journal as well. Thank you for making > this code public as hopefully this will help others in this field. > > When sliding the window to extract data, the window length and step > length should be set the same to ensure that the extracted data does not > overlap. Should data with no activity be classified separately for training > to help the model understand background noise and data characteristics in a > normal state? > > — > Reply to this email directly, view it on GitHub > <#3 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ACOIPDFM6BRN6EDINZP3KX3ZG3ABFAVCNFSM6AAAAABJDXZX6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQGI2TMNJSGU> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

from csi-activity-recognition.

joekerrXie commented on August 22, 2024

No. The data is still not independent even if windows do not overlap. You just cannot do this. Ideally the data should be split so that no subject (person) overlaps training/test/validation sets as the data is not independent then either. Read any text on this.
…
On Tue, Jun 11, 2024 at 7:34 PM joekerrXie @.> wrote: Thanks for your reply. FYI randomising time series data into train/validation sets is invalid. I understand the original paper you based this on also made this mistake. With 800 msec overlap between records almost every validation sample has at least one record in the training set that is 80% identical. Splits should be done on a per subject basis. I am notifying the original authors and journal as well. Thank you for making this code public as hopefully this will help others in this field. When sliding the window to extract data, the window length and step length should be set the same to ensure that the extracted data does not overlap. Should data with no activity be classified separately for training to help the model understand background noise and data characteristics in a normal state? — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOIPDFM6BRN6EDINZP3KX3ZG3ABFAVCNFSM6AAAAABJDXZX6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQGI2TMNJSGU . You are receiving this because you were mentioned.Message ID: @.>

Thank you for sharing it ,So the best way to ensure the independence of the dataset is to distinguish based on the subjects? According to this rule, it is evident that using this method of data splitting leads to model overfitting, which is why the test accuracy is so high. Do you have any recommended open-source datasets? I find it a bit challenging to find complete datasets on GitHub.

from csi-activity-recognition.

Tsardoz commented on August 22, 2024

Subject wise splitting is definitely the best way. In university studies (ie. most published ones) there are only a few subjects so this usually leads to really poor results (which is probably why nobody does this). You can also keep the time series nature intact and put (say) first half of each experiment into training, then split the remainder into test and validation. This is still not ideal as the data is not independent but far better than randomising everything, which is cheating (unintentionally or otherwise). Very few datasets are available on internet. This was the only one I could find but I stopped looking very soon after this. Honestly I think this whole field is vaporware like cold fusion. A lot of papers published about nothing. If there was anything in it we would have seen commercial devices by now. Espressif have a demo showing you can detect movement in a room and I think that will be about the extent of it. Many technologies can do this though. https://www.hackster.io/news/espressif-shows-off-sensorless-esp-wifi-csi-radar-human-occupancy-activity-solution-909bf970a8e6 If you are cynical (like me) you might question why there are no publicly available datasets and software. And no systems you can buy. I wrote a Medium article: ***@***.***/researchers-misrepresenting-the-capability-of-human-pose-estimation-from-wifi-channel-strength-4ec4d2f871a4?sk=8904bfff93502326db6af6b632bfe8c7 My suggestion would be to look at another topic if you want to do anything meaningful.

…

On Tue, Jun 11, 2024 at 8:18 PM joekerrXie ***@***.***> wrote: No. The data is still not independent even if windows do not overlap. You just cannot do this. Ideally the data should be split so that no subject (person) overlaps training/test/validation sets as the data is not independent then either. Read any text on this. … <#m_-6123287684221165531_> On Tue, Jun 11, 2024 at 7:34 PM joekerrXie *@*.*> wrote: Thanks for your reply. FYI randomising time series data into train/validation sets is invalid. I understand the original paper you based this on also made this mistake. With 800 msec overlap between records almost every validation sample has at least one record in the training set that is 80% identical. Splits should be done on a per subject basis. I am notifying the original authors and journal as well. Thank you for making this code public as hopefully this will help others in this field. When sliding the window to extract data, the window length and step length should be set the same to ensure that the extracted data does not overlap. Should data with no activity be classified separately for training to help the model understand background noise and data characteristics in a normal state? — Reply to this email directly, view it on GitHub <#3 (comment) <#3 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOIPDFM6BRN6EDINZP3KX3ZG3ABFAVCNFSM6AAAAABJDXZX6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQGI2TMNJSGU <https://github.com/notifications/unsubscribe-auth/ACOIPDFM6BRN6EDINZP3KX3ZG3ABFAVCNFSM6AAAAABJDXZX6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQGI2TMNJSGU> . You are receiving this because you were mentioned.Message ID: @.*> Thank you for sharing it ,So the best way to ensure the independence of the dataset is to distinguish based on the subjects? According to this rule, it is evident that using this method of data splitting leads to model overfitting, which is why the test accuracy is so high. Do you have any recommended open-source datasets? I find it a bit challenging to find complete datasets on GitHub. — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACOIPDAVIME23LZ2PSVLV7LZG3FJDAVCNFSM6AAAAABJDXZX6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQGM3TGMZUGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

from csi-activity-recognition.

joekerrXie commented on August 22, 2024

Subject wise splitting is definitely the best way. In university studies (ie. most published ones) there are only a few subjects so this usually leads to really poor results (which is probably why nobody does this). You can also keep the time series nature intact and put (say) first half of each experiment into training, then split the remainder into test and validation. This is still not ideal as the data is not independent but far better than randomising everything, which is cheating (unintentionally or otherwise). Very few datasets are available on internet. This was the only one I could find but I stopped looking very soon after this. Honestly I think this whole field is vaporware like cold fusion. A lot of papers published about nothing. If there was anything in it we would have seen commercial devices by now. Espressif have a demo showing you can detect movement in a room and I think that will be about the extent of it. Many technologies can do this though. https://www.hackster.io/news/espressif-shows-off-sensorless-esp-wifi-csi-radar-human-occupancy-activity-solution-909bf970a8e6 If you are cynical (like me) you might question why there are no publicly available datasets and software. And no systems you can buy. I wrote a Medium article: @./researchers-misrepresenting-the-capability-of-human-pose-estimation-from-wifi-channel-strength-4ec4d2f871a4?sk=8904bfff93502326db6af6b632bfe8c7 My suggestion would be to look at another topic if you want to do anything meaningful.
…
On Tue, Jun 11, 2024 at 8:18 PM joekerrXie @.> wrote: No. The data is still not independent even if windows do not overlap. You just cannot do this. Ideally the data should be split so that no subject (person) overlaps training/test/validation sets as the data is not independent then either. Read any text on this. … <#m_-6123287684221165531_> On Tue, Jun 11, 2024 at 7:34 PM joekerrXie @.> wrote: Thanks for your reply. FYI randomising time series data into train/validation sets is invalid. I understand the original paper you based this on also made this mistake. With 800 msec overlap between records almost every validation sample has at least one record in the training set that is 80% identical. Splits should be done on a per subject basis. I am notifying the original authors and journal as well. Thank you for making this code public as hopefully this will help others in this field. When sliding the window to extract data, the window length and step length should be set the same to ensure that the extracted data does not overlap. Should data with no activity be classified separately for training to help the model understand background noise and data characteristics in a normal state? — Reply to this email directly, view it on GitHub <#3 (comment) <#3 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOIPDFM6BRN6EDINZP3KX3ZG3ABFAVCNFSM6AAAAABJDXZX6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQGI2TMNJSGU https://github.com/notifications/unsubscribe-auth/ACOIPDFM6BRN6EDINZP3KX3ZG3ABFAVCNFSM6AAAAABJDXZX6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQGI2TMNJSGU . You are receiving this because you were mentioned.Message ID: @.> Thank you for sharing it ,So the best way to ensure the independence of the dataset is to distinguish based on the subjects? According to this rule, it is evident that using this method of data splitting leads to model overfitting, which is why the test accuracy is so high. Do you have any recommended open-source datasets? I find it a bit challenging to find complete datasets on GitHub. — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOIPDAVIME23LZ2PSVLV7LZG3FJDAVCNFSM6AAAAABJDXZX6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQGM3TGMZUGM . You are receiving this because you were mentioned.Message ID: @.***>

yes ,you are right , In fact, action recognition based on WiFi signals started a long time ago, from RSSI to CSI. Unfortunately, most of the models in the papers are only effective on the current test data, and the data in most of the papers is not traceable.

from csi-activity-recognition.

data about csi-activity-recognition HOT 13 OPEN

Comments (13)

Related Issues (3)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent