quinta-feira, 16 de maio de 2024

How Data Analytics and Data Science Fit: A Join Research Methodological Perspective - By Faiza Bukhsh and Maya Daneva

Definition: Design Science is the design and inestigation of artifacts in context.

Design - design/investigate scinticif methods processes, algorithms.
Investigation - investigat from noisy, structure and unstructured data
Artifacts - extract or extrapolate knowledge and insights
Context - apply in context the knowlede grom data across a broad range of application domains.

There are different perspectives to define/use Data Science, depending on your field. See figure below:
The basic phases of Data Analytics can also be seen as the basic phases of Design Science (it really depends on the project how to frame each phase. For instance, for a software, the phases have to do with software development)
CRISP-DM: a method of Data Mining, which proposes a life cycle which is similar to any design project. It comprehends Business Understandingg, Dta Understanding, Data Preparation, Data Modeling, Evaluation and Deployment. It is an iterative and cyclic process. See figure below:
Different roles are responsible for each activity but they should also collaborate and participate in each other activities. Example:
Many times, the problem happens because the data is not well-prepared. We get excited to run our model and train it, but if the data is noisy, not well-prepared, you will never achieve the accuracy that you are looking for. So you should stop doing mindless effort and go back to data preparation. For acquiring data, there are a few possibilities. One solution she made with a hospital is having the data aways on their site on a particular server and then she can access it through a VPN. But for privacy issues, she cannot access the data itself, only the results of the application of algorithms. *This seems like a promising solution There is also another interesting methodology which has the same phases, but what is special about it is that it always loops back to a previous activity. See figure:
There is a very interesting method called SEMMA. Sample, Explore, Modify, Model and Assess. They have a setp-by-step guidance of how to tackle each phase:
She mentions four methodologies for Design Science. Among them, Roel Wieringa's, Paul Johanssen's and Peffer's. let's start with Paul Johanssen's method
Paul's model has a lot of cycles on it. The Data Science methods also have cylces. So we can start analyzing in which parts of one model we can insert the other. Peffers Design Cycle
Wieringa's method:
In the work of Wieringa, data analysis will be in the treatment design, since this is where you do the modeling, the training and the tunning. But if you are going for a knowledge problem, then it means that you are trying to extract knowledge from knowledge (so something like LLM). Then, data analysis is in the setup phase (for LLM, it will be prompt engineering). How can I know what my artifact is? (the artifact that should be designed)
It all depends on the objective. You have to ask yourself what is the goal of that design project.

What is my artifact?
The artifact in a Data Analytics project can be the Data preparation process itself, it can be the model or the model result, it can be the evaluation process or evaluation criteria. So we have to ask ourselves again what the objective is. And then you will know what is your artifact.

Keynote by Carlos Ribas (Bosch) - The power of Information Systems shaping the future of the Automative Industry

He presented dan interesting tool called Bmlp associated with an operating system named TOM to automate smart factories. Read about it here: https://www.iotm2mcouncil.org/iot-library/news/connected-industries-news/bosch-commits-to-global-industrial-aiot/

He also discussed how Bosch inveted in Digital Twins to help having more prompt predictions of failures in their factories. Read about it here: https://www.bosch-connected-industry.com/de/en/iiot-insights/digital-twins

He talked about the role of AI in Manufacturing
Examples of use:
He also mentioned that none of this is important if tecnhology does not improve the life or work of people who work in the factories.
Digital twins can help simulate in the lab before the equipment is put in the plant. The other use is in real-time data acquisition. For example, when the equipment is put to test, at the same time, they can inspect data coming from the test and discover on the flight. And they will know that in a particular component, under specific conditions, they have errors. This is really helpful for them.

They treat data integration in these terms: from each sensor, data is sent to data repositories in specific format and also adding labels. This facilitates recover data from different apps, different systems. *It seems to me they treat this in the syntactic level.

I also found a link to an interesting data platform: https://www.bosch-connected-industry.com/de/en/portfolio/bosch-semantic-stack
I wonder if there are some more sophisticated semantic technologies in place, of which perhaps Carlos is not aware.

They can detect a problem in the process, not after the process is finished. That is why they feel so much in control. The faulty components are immediately rejected, removed from the process, suffer maintenance, and then go back to the process.
They are not currently investing in LLM because they do not feel the need. Sometimes the volume of data being too high, it does not help.

The people who used to work in the plant doing mechanical work are still there, and they are trained and "re-skilled". In the last years, the process of training has been very intensive. Sometimes, they are not learning new things very easily, but Bosch sees this is a mission. If they do not

People need to develop different competencies. In the future, in the recruitment process, new people need to come with a degree. Currently, many of them are low level engineers (now the work force is 40% of people have at least a degree). They need to gain knowledge about the new technologies. This is a must!

Alessandro Oltramani, an expert in logic-symbolic reasoning is the new leader of the Carnegie Bosch Institute: https://carnegiebosch.cmu.edu/

Giancarlo asked if this shows that such kind of approach is a current bet of Bosch. Carlos responded that is for sure.