quinta-feira, 16 de maio de 2024

How Data Analytics and Data Science Fit: A Join Research Methodological Perspective - By Faiza Bukhsh and Maya Daneva

Definition: Design Science is the design and inestigation of artifacts in context.

Design - design/investigate scinticif methods processes, algorithms.
Investigation - investigat from noisy, structure and unstructured data
Artifacts - extract or extrapolate knowledge and insights
Context - apply in context the knowlede grom data across a broad range of application domains.

There are different perspectives to define/use Data Science, depending on your field. See figure below:
The basic phases of Data Analytics can also be seen as the basic phases of Design Science (it really depends on the project how to frame each phase. For instance, for a software, the phases have to do with software development)
CRISP-DM: a method of Data Mining, which proposes a life cycle which is similar to any design project. It comprehends Business Understandingg, Dta Understanding, Data Preparation, Data Modeling, Evaluation and Deployment. It is an iterative and cyclic process. See figure below:
Different roles are responsible for each activity but they should also collaborate and participate in each other activities. Example:
Many times, the problem happens because the data is not well-prepared. We get excited to run our model and train it, but if the data is noisy, not well-prepared, you will never achieve the accuracy that you are looking for. So you should stop doing mindless effort and go back to data preparation. For acquiring data, there are a few possibilities. One solution she made with a hospital is having the data aways on their site on a particular server and then she can access it through a VPN. But for privacy issues, she cannot access the data itself, only the results of the application of algorithms. *This seems like a promising solution There is also another interesting methodology which has the same phases, but what is special about it is that it always loops back to a previous activity. See figure:
There is a very interesting method called SEMMA. Sample, Explore, Modify, Model and Assess. They have a setp-by-step guidance of how to tackle each phase:
She mentions four methodologies for Design Science. Among them, Roel Wieringa's, Paul Johanssen's and Peffer's. let's start with Paul Johanssen's method
Paul's model has a lot of cycles on it. The Data Science methods also have cylces. So we can start analyzing in which parts of one model we can insert the other. Peffers Design Cycle
Wieringa's method:
In the work of Wieringa, data analysis will be in the treatment design, since this is where you do the modeling, the training and the tunning. But if you are going for a knowledge problem, then it means that you are trying to extract knowledge from knowledge (so something like LLM). Then, data analysis is in the setup phase (for LLM, it will be prompt engineering). How can I know what my artifact is? (the artifact that should be designed)
It all depends on the objective. You have to ask yourself what is the goal of that design project.

What is my artifact?
The artifact in a Data Analytics project can be the Data preparation process itself, it can be the model or the model result, it can be the evaluation process or evaluation criteria. So we have to ask ourselves again what the objective is. And then you will know what is your artifact.

Nenhum comentário:

Postar um comentário