What are exploratory data analysis
In statistics, exploratory data analysis or EDA is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. Exploratory data analysis is a loosely defined term that involves using graphics and basic sample statistics such as mean and median or standard deviation to get a feeling for what information might be obtainable from your data set. EDA is a set of techniques that allows analysts to quickly look at data for trends, outliers and patterns. The eventual goal of EDA is to obtain theories that can later be tested in the modeling step. Exploratory data analysis is an approach for data analysis that employs a variety of techniques, mostly graphical, to maximize insight into a data set, uncover underlying structure, extract important variables, detect outliers and anomalies, test underlying assumptions, develop parsimonious models and determine optimal factor settings.
The three popular data analysis approaches are classical, exploratory data analysis and Bayesian. These three approaches are similar in that they all start with a general science engineering problem and all yield science engineering conclusions. The difference is the sequence and focus of the intermediate steps. For classical analysis, the data collection is followed by the imposition of a model, normality, linearity, for example, and the analysis, estimation and testing that follows are focused on the parameters of that model. For EDA, the data collection is not followed by a model imposition. Rather, it is followed immediately by analysis with a goal of inferring what model would be appropriate. Unlike the classical approach, the exploratory data analysis approach does not impose deterministic or probabilistic models on the data. On the contrary, the EDA approach allows the data to suggest admissible models that best fit the data. Finally, for a Bayesian analysis, the analyst attempts to answer research questions about unknown parameters using probability statements based on prior data. They may bring their own domain knowledge and/or expertise to the analysis as new information is obtained, so that's the purpose of Bayesian analysis, is to determine posterior probabilities based on prior probabilities and new information. Posterior probabilities is a the probability an event will happen after all evidence or background information has been taken into account. Prior probability is the probability an event will happen before you've taken adding new evidence into account.
EDA techniques are generally graphical. They include scatterplots, boxplots, histograms, et cetera. In the real world, data analysts freely mix elements of all of the above three approaches and other approaches, as well. The above distinctions were made to emphasize the major differences among the three approaches.