Giter Site home page Giter Site logo

datasaurus's Introduction

Importance of Visualizing and Analyzing Data Instead of Solely Relying on Indicators

Do you have the habit of merely analyzing indicators or the results of an analysis without looking at how the data is distributed? If so, you might be overlooking the power of visualization!

In 1973, the statistician Francis Anscombe expressed his concern about this phenomenon by presenting Anscombe's Quartet, a set of four graphs that challenged the prevailing idea at the time that data calculations and indicators should take precedence over visualizing this information in data analysis.

All the data sets in the Quartet share the same statistical properties, such as mean and standard deviation, and therefore yield identical indicators. However, when you plot the graphs and visualize the results, you'll notice that each data set has a distinct profile. Analyzing them solely based on the result of regression or other models does not generate the necessary information to effectively understand the problem, identify what's happening, and propose effective solutions.

To illustrate this situation, let's consider a scenario where you want to investigate the relationship between the amount of advertising for a product and its sales, for example. After conducting research, you find that the mean and standard deviation of the advertising quantity are 54.26 and 19.76, and for sales, they are 47.83 and 26.93. Additionally, the correlation between these two variables is -0.06. The four graphs presented in this post have these statistical properties, and considering only these calculated pieces of information can lead to incorrect conclusions about the behavior and relationship between these variables.

In Anscombe's words: "Do both calculations and graphs. Both types of results should be studied; each will contribute to understanding."

If you want to learn more about this topic, I recommend reading the article "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing," available at this link. And if you'd like to reproduce these graphs locally to investigate this phenomenon, you can fork this repo.

datasaurus's People

Contributors

camilasbraz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.