Skip to main content

What are exploratory data analysis

In statistics, exploratory data analysis or EDA is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. Exploratory data analysis is a loosely defined term that involves using graphics and basic sample statistics such as mean and median or standard deviation to get a feeling for what information might be obtainable from your data set. EDA is a set of techniques that allows analysts to quickly look at data for trends, outliers and patterns. The eventual goal of EDA is to obtain theories that can later be tested in the modeling step. Exploratory data analysis is an approach for data analysis that employs a variety of techniques, mostly graphical, to maximize insight into a data set, uncover underlying structure, extract important variables, detect outliers and anomalies, test underlying assumptions, develop parsimonious models and determine optimal factor settings.

EDA techniques are generally graphical. They include scatterplots, boxplots, histograms, et cetera. In the real world, data analysts freely mix elements of all of the above three approaches and other approaches, as well. The above distinctions were made to emphasize the major differences among the three approaches.