This day is dedicated to talks to Barend Mons, culminating in his retirement speech this afternoon.
The day was open with a nice talk by George Strawn and followed by a funny examination of Barend's quotes in the Pre-FAIR (by Ruben Kok/Nico van Meeteren/Jan Velterop). Here is an intereting quote from the latter:
Talk about Monssense (by Karel Luyben), a chat of three researchers and friends who discuss the ideas of their research. Here is an interesting summary of Barend's main idea:
Jean Claude Burgelman talked about how science was thought when he met Barend:
There was an interesting talk about Barend and his relationship to CODATA by Simon Hodson. And in this talk, I learned about some very relevant results from the WorldFAIR Project (still ongoing):
In particular, we should pay attention at CDIF - Cross Domain Interoperability Framework.
New CODATA projects:
Mercè Crosas, the new president of CODATA also said a few words. She is also the person behind the Barcelona Super Computing Center.
Christina Kirkpatrick (the person behind the supercomputer center in San Diego (US) spoke to us how she met Barend and Albert and how their work started. She reminded us of the FAIR pillars (see below) and reminded us that the biggest challenge is actually changing people's mindset regarding how to deal with scientific data.
She also talked about the GO-FAIR offices around the world and their roles:
René Belsø talked about different approaches (and competencies) on developing infrastructure in this very interesting chart:
Hana Pergl was an operation manager for GO-FAIR and now she is working for CODATA. Here is her historical view about GO-FAIR:
Elena Giglia conducted on a community building session with interaction from the audience through Menti. It was really fun!
Carole Gobble reflected on important questions, such as: What is FAIR after all? How much is FAIR enough? Some people think enough means "FAIRly AI Ready".
She is involved in the creation of several standards related to FAIR:
And the only way to really go FAIR worldwide is to make FAIR disappear:
FAIR enough (definition from the Collins dictionary): to acknowldege what someone has just said and to indicate that you understand it. (Even though perhaps you disagree -- her addition) :)
She presented a very interesting model resembling a layered/block based FAIR archictecture:
She also mentioned Biofair - A BioCommons infrastructure for UK life science researchers
"Data and Reality" was the title of Giancarlo's talk
He talked about concepts and the need for conceptualization. And he placed an important focus on "qua individuals", as a favorite ontological notion of Barend (introduced by Giancarlo in their first encounter). One of the most interesting quotes he brought in his talk was the following:
And one point that was probably the highlight is when he said that the most important piece of scientific information he has today is that counterfactual info cannot be extracted from the data (data is always about something that did happen), and counterfactual info is essential to help us shape the world. He finalized the talk by saying that being able to create such ideas and possible worlds is one of Barend's talents.
Inspired by my brother Vítor, which has the great idea of sharing conference notes, I decided to create this blog to gather my own notes. Hopefully these notes and thoughts are going to be useful for someone else. Especially, I would love it to be useful to Vítor! : ) Comments, critics and suggestions are more than welcome. They are necessary to help me make sense of the knowledge here gathered. Enjoy!
sexta-feira, 6 de setembro de 2024
quinta-feira, 5 de setembro de 2024
GO-FAIR Workshop - September 5th, 2024
My question to LIFES Core Scientific Members: How can FAIR become widespread? What kind of resources/help do you provide?
It is easy to make the trains interoperate without an ecosystem where people/institutions are taken the same choices of languages/frameworks to use. But we have to make the trians interoperable regardless of the ecosystem's technical choices.
Thus we need to have a Semantic FAIR library of instructions to connec all trains.
LIFES different types of partners
FAIR Capacity Building Program - Training developed by the GO FAIR Foundation
FAIR Data Engineering - Herman van Haagen
Hoogeschool Leiden
They developed a new course named FAIR Data Engineering to start this November.
Companies offerring tools/resources for FAIR:
Figures from Eureto
Figure from Pistoia Alliance:
There were interesting presentation of videos from GoFair-Brazil and Semantic Climate
and a brief talk from GOFair-USA.
There was a panel with partners from Africa:
- Easu Yazew (from VODAN)
- President of the African Academy of Sciences
- FoodSafety for Africa
They have an interesting focus on knowledge management. I talked to Peter Scelstraete (from Ubuntoo BV) and told him about my views on KM, i.e., it needs to please in the first place the knowldge holders; and exchanging tacit knowledge can only be based on social interaction.
- Data stewards are very important fort hat
- Workflows/processes published on how to do it
- Catalysts – collection of decisions that others made and that can be reused.
It is easy to make the trains interoperate without an ecosystem where people/institutions are taken the same choices of languages/frameworks to use. But we have to make the trians interoperable regardless of the ecosystem's technical choices.
Thus we need to have a Semantic FAIR library of instructions to connec all trains.
LIFES different types of partners
FAIR Capacity Building Program - Training developed by the GO FAIR Foundation
FAIR Data Engineering - Herman van Haagen
Hoogeschool Leiden
They developed a new course named FAIR Data Engineering to start this November.
Companies offerring tools/resources for FAIR:
- 4medbox - infrastructure to control health personal data
- FAIR Solutions< provides services/resources to facilitate FAIR adoption
- WDS, focused on developing FAIR data repositories, offers serves for FAIR adoption, including AI readiness resources
- Elsevier Research Collaborations - there is a branch that wants to 'sell' the data/knowledge associated to the papers. Joining LIFES makes sense for them because they want to be able to sell bits of data, instead of only selling the whole dataset.
- Visma Connect (Dyone van der Leer-de Mari) - They want to share data following a federated data model. Currently only 10% of the data is accessible for use, according to her. They made a ecosystem of a data space. They also build taxonomies, including the taxonomy used by the Dutch government.
- Roseman Labs - they are building encrypted data spaces to enable "one to create insignts from data that one doesn't have."
- Euretos - help scientists to go FAIR.
- dsm-firmenich - develops a food application focusing on nutricion and beauty in an innovative way. Involved in a project called ConnectedLabs that organize data in a consistent way. They are moving more and more to a data-centered company
- Pistoia Alliance (Christian Baber/Giovanni Nisato) - this ONG has the mission to lower the barrer to innovation through pre competitive collaboration. They have top clients from the farmaceutical domain. GoFair is also a member.
- CODATA - organization aiming at "promoting global collaboration to improve the availability and usability of data for all areas of research" (from the website)
Figures from Eureto
Figure from Pistoia Alliance:
There were interesting presentation of videos from GoFair-Brazil and Semantic Climate
and a brief talk from GOFair-USA.
There was a panel with partners from Africa:
- Easu Yazew (from VODAN)
- President of the African Academy of Sciences
- FoodSafety for Africa
They have an interesting focus on knowledge management. I talked to Peter Scelstraete (from Ubuntoo BV) and told him about my views on KM, i.e., it needs to please in the first place the knowldge holders; and exchanging tacit knowledge can only be based on social interaction.
sexta-feira, 19 de julho de 2024
FOIS'24 - Ontology Engineering Session
A Textual Syntax and Toolset for Well-Founded Ontologies
Matheus L. Coutinho, João Paulo Almeida, Tiago Prince Sales and Giancarlo Guizzardi
Idea behind Tonto:
- Easier to read syntax, like programming language
- Text-based specification to cope with git-like mechanisms
- Dual-channel processing theory Textual+Visual= Enhanced comprehension
- Better version control - Easier modularization Tonto Editor:
- java-like syntax
- you can declare concepts, properties and relations
- you can change the text specification and have a visual (diagramatic) view of the changes in the ontology
- they use colors for the different types of elements (stereotypes), both in the text and in the diagram
- you can use some validators: e.g., a semantic-motivator syntatic validator
- they created a package manager to support modularization
Integration with OntoUML server and ontouml-js
- JSON generation cmpliant to OntoUML Schema
- importing JSON to Visual Paradigm
- Importing JSON to Tonto
- Model validation
- Transformation to UFO-based OWL.
Matheus L. Coutinho, João Paulo Almeida, Tiago Prince Sales and Giancarlo Guizzardi
Idea behind Tonto:
- Easier to read syntax, like programming language
- Text-based specification to cope with git-like mechanisms
- Dual-channel processing theory Textual+Visual= Enhanced comprehension
- Better version control - Easier modularization Tonto Editor:
- java-like syntax
- you can declare concepts, properties and relations
- you can change the text specification and have a visual (diagramatic) view of the changes in the ontology
- they use colors for the different types of elements (stereotypes), both in the text and in the diagram
- you can use some validators: e.g., a semantic-motivator syntatic validator
- they created a package manager to support modularization
Integration with OntoUML server and ontouml-js
- JSON generation cmpliant to OntoUML Schema
- importing JSON to Visual Paradigm
- Importing JSON to Tonto
- Model validation
- Transformation to UFO-based OWL.
Keynote@FOIS'24 - Stop Data Sharing - Barend Mons
Big data:
Volume - Variety - Velocity
Volume: 10^14 assertions of the type S-P-O
Sensitive data - Privacy
E.g., Data about tigers in China would lead tigers to extinction
He is working with Giancarlo, Luiz Olavo and others to constrain LLMs to avoid and explain hallucinations.
Genome project - Researchers working on the genome of humanity needs very sensitive data. So people in Africa tell him: "You want our data? Come visit us!".
- If you use LLM here, you will get the knowledge that is already out there, so that's not what you need.
FAIR (and the FAIR Train Architecture) supports the inauguration of the data visiting paradigm. The data stays where it belongs (with its owner/controller) and the algorithms travel and process the data according to specific protocols and permissions. How to approach Data Governance: How Knowlets can help srinking the volume of big data: The concept of qua individual helps us solve the problem of near-seameness.
The importance of near-sameness provided by Knowlets: There are many applications for machines in which we would never be able to calculate or find information alone. So machine readable data can allow us to do researchers in Biology that we would never be able to do before, due to its processing capacity. It is cognitively impossible for us to gather, find and process so much information, and machines do that in a few seconds.
He has rescently started an institute part of the Univ. of Leiden (and having Univ. of Twente as co-founder) to do research in this topic. Here is how research is made: FAIR Library of Instructions: How FAIR works: There are many countries interested in creating institutes such as LIFES and work together with he in this initiative (some of these contacts come from initiative in which he participates) Making ethical and legal constraints machine-readable. This way, machines (even not understanding what ethics means) know where to stop. Then you can add in the data station a list of ethical constraints and before the algorithm enters the station, the station needs to make sure that these constraints are met.
Ontological precision is very, very important. But it does not need to be perfect, since the world is vague and we like it that way. So let's work on ontologies, but forget about going the last extra miles.
Volume - Variety - Velocity
Volume: 10^14 assertions of the type S-P-O
Sensitive data - Privacy
E.g., Data about tigers in China would lead tigers to extinction
He is working with Giancarlo, Luiz Olavo and others to constrain LLMs to avoid and explain hallucinations.
Genome project - Researchers working on the genome of humanity needs very sensitive data. So people in Africa tell him: "You want our data? Come visit us!".
- If you use LLM here, you will get the knowledge that is already out there, so that's not what you need.
FAIR (and the FAIR Train Architecture) supports the inauguration of the data visiting paradigm. The data stays where it belongs (with its owner/controller) and the algorithms travel and process the data according to specific protocols and permissions. How to approach Data Governance: How Knowlets can help srinking the volume of big data: The concept of qua individual helps us solve the problem of near-seameness.
The importance of near-sameness provided by Knowlets: There are many applications for machines in which we would never be able to calculate or find information alone. So machine readable data can allow us to do researchers in Biology that we would never be able to do before, due to its processing capacity. It is cognitively impossible for us to gather, find and process so much information, and machines do that in a few seconds.
He has rescently started an institute part of the Univ. of Leiden (and having Univ. of Twente as co-founder) to do research in this topic. Here is how research is made: FAIR Library of Instructions: How FAIR works: There are many countries interested in creating institutes such as LIFES and work together with he in this initiative (some of these contacts come from initiative in which he participates) Making ethical and legal constraints machine-readable. This way, machines (even not understanding what ethics means) know where to stop. Then you can add in the data station a list of ethical constraints and before the algorithm enters the station, the station needs to make sure that these constraints are met.
Ontological precision is very, very important. But it does not need to be perfect, since the world is vague and we like it that way. So let's work on ontologies, but forget about going the last extra miles.
quinta-feira, 18 de julho de 2024
Keynote@FOIS'24 - Where to locate the explainability of explainable machine learning? By Mieke Boon
New title: Exploring AI/MLR Epistemology
Epistemology 16th-17th century
- Rationalism - Rene Descartes
- Empiricism - Francis Bacon
Thinking about Empiricism (which resembles ML processes), there were criticism:
Hume (1748) An enquiry concerning Human Understanding
Threre are fundamental problems:
- induction: the principle of induction is logically invalid;
- causality: causal connection, e.g., the power, cannot be observed.
Practical example of (lack of) causality: Further developments in Empiricism: Problems with ML empiricism:
Problem 1 - If you look for patterns in the data, you will find them (even if there is no causation)
Problem 2 - Need for explanations
How does (Logical) empriicism solve the problem of explanation? - without causes or mechanism (anti-metaphysics)
The way it is looked at today in ML, explanations are similar to what we do in the lab: this variable has a lot of effect in the result, while this one does not. Problem 3 - It denies the epistemic and pragmatic value of (causal)-mechanistics explanation: explanations of regularities.
Philosophers of science criticize empiricist epistemology and aim at solutions
- Alternative epistemology should answer: what is a real law? How do we know that a mathematical structure or statically relevant correlation found in the data is a real law? Response: iff there is a mechanism that explains the law.
- the mechanism thus makes the law intelligibel - i.e., it explains the law
- human reasoning in science: rather than identifying laws, researchers explain by constructing a model of the underlying mechanism.
Kant's epistemology (18th centruy) : concepts + power of jusdgement Kant reconciled and transcended the rationalist and empiricist epistemologies. by providing an alterrnative to the traditional questions of: what is the baseis of true knowelge? how can we be certain?
Kant's questions: How is it possible that we have knoweldge of the world? What are the conditions for the possibility of having knowelvge anyway?
Kant claims:
- Man himself creates all his respresnetations and concepts. - Concepts as conditions for the posbiiility of having anc crating knowlefge about reality. withouth these concepts, we would not be albe to make any staetemnt about reality on the bases of mere observations.
- Perceptions without concepts are empty; concepts without perceptions are blind.
- Kant considers the crucial and intellegcuatl role of human in creating concepts.
Kantian Epistemology => Conceptual Modeling
She presents a High school level in which conceptual modeling precedes mathematical modeling:
Kant:
Concepts (verstand) + power of judgement (urteilskraft) - meaning, values appreciation / emotion.
How do we make sure the model is a good representation of the world?
- Picture of the pope is a good model of the Pope.
- ... But not of Trump
However, Trump is also seating in a chair in the same position as the pope! So, in the world, a model can be similar to the world in many different ways.
In summary...
Epistemology 16th-17th century
- Rationalism - Rene Descartes
- Empiricism - Francis Bacon
Thinking about Empiricism (which resembles ML processes), there were criticism:
Hume (1748) An enquiry concerning Human Understanding
Threre are fundamental problems:
- induction: the principle of induction is logically invalid;
- causality: causal connection, e.g., the power, cannot be observed.
Practical example of (lack of) causality: Further developments in Empiricism: Problems with ML empiricism:
Problem 1 - If you look for patterns in the data, you will find them (even if there is no causation)
Problem 2 - Need for explanations
How does (Logical) empriicism solve the problem of explanation? - without causes or mechanism (anti-metaphysics)
The way it is looked at today in ML, explanations are similar to what we do in the lab: this variable has a lot of effect in the result, while this one does not. Problem 3 - It denies the epistemic and pragmatic value of (causal)-mechanistics explanation: explanations of regularities.
Philosophers of science criticize empiricist epistemology and aim at solutions
- Alternative epistemology should answer: what is a real law? How do we know that a mathematical structure or statically relevant correlation found in the data is a real law? Response: iff there is a mechanism that explains the law.
- the mechanism thus makes the law intelligibel - i.e., it explains the law
- human reasoning in science: rather than identifying laws, researchers explain by constructing a model of the underlying mechanism.
Kant's epistemology (18th centruy) : concepts + power of jusdgement Kant reconciled and transcended the rationalist and empiricist epistemologies. by providing an alterrnative to the traditional questions of: what is the baseis of true knowelge? how can we be certain?
Kant's questions: How is it possible that we have knoweldge of the world? What are the conditions for the possibility of having knowelvge anyway?
Kant claims:
- Man himself creates all his respresnetations and concepts. - Concepts as conditions for the posbiiility of having anc crating knowlefge about reality. withouth these concepts, we would not be albe to make any staetemnt about reality on the bases of mere observations.
- Perceptions without concepts are empty; concepts without perceptions are blind.
- Kant considers the crucial and intellegcuatl role of human in creating concepts.
Kantian Epistemology => Conceptual Modeling
She presents a High school level in which conceptual modeling precedes mathematical modeling:
Kant:
Concepts (verstand) + power of judgement (urteilskraft) - meaning, values appreciation / emotion.
How do we make sure the model is a good representation of the world?
- Picture of the pope is a good model of the Pope.
- ... But not of Trump
However, Trump is also seating in a chair in the same position as the pope! So, in the world, a model can be similar to the world in many different ways.
In summary...
Assinar:
Postagens (Atom)