sexta-feira, 19 de julho de 2024

Keynote@FOIS'24 - Stop Data Sharing - Barend Mons

Big data:
Volume - Variety - Velocity
Volume: 10^14 assertions of the type S-P-O

Sensitive data - Privacy
E.g., Data about tigers in China would lead tigers to extinction

He is working with Giancarlo, Luiz Olavo and others to constrain LLMs to avoid and explain hallucinations.

Genome project - Researchers working on the genome of humanity needs very sensitive data. So people in Africa tell him: "You want our data? Come visit us!".
- If you use LLM here, you will get the knowledge that is already out there, so that's not what you need.
FAIR (and the FAIR Train Architecture) supports the inauguration of the data visiting paradigm. The data stays where it belongs (with its owner/controller) and the algorithms travel and process the data according to specific protocols and permissions.
How to approach Data Governance:
How Knowlets can help srinking the volume of big data:
The concept of qua individual helps us solve the problem of near-seameness.
The importance of near-sameness provided by Knowlets:
There are many applications for machines in which we would never be able to calculate or find information alone. So machine readable data can allow us to do researchers in Biology that we would never be able to do before, due to its processing capacity. It is cognitively impossible for us to gather, find and process so much information, and machines do that in a few seconds.
He has rescently started an institute part of the Univ. of Leiden (and having Univ. of Twente as co-founder) to do research in this topic.
Here is how research is made:
FAIR Library of Instructions:
How FAIR works:
There are many countries interested in creating institutes such as LIFES and work together with he in this initiative (some of these contacts come from initiative in which he participates)
Making ethical and legal constraints machine-readable. This way, machines (even not understanding what ethics means) know where to stop. Then you can add in the data station a list of ethical constraints and before the algorithm enters the station, the station needs to make sure that these constraints are met.

Ontological precision is very, very important. But it does not need to be perfect, since the world is vague and we like it that way. So let's work on ontologies, but forget about going the last extra miles.

Nenhum comentário:

Postar um comentário