terça-feira, 8 de abril de 2025

REFSQ Paper Session on Responsible AI

Towards a Value-Complemented Framework for Enabling Human Monitorin in Cyber-Physcial Systems.
Zoe Pfister, Michael Vierhauser, Rebekka Wohlrab and Ruth Breu

CPS are increasingly prevalent in our society - they combine the digital and physical parts of a system

CPS regularly operatae in uncertain (safety critical) environments and interact with humans.
Runtime monitorign for hardware and softwre is well-established. But not many approaches talking about monitoring interaction and security of humans.
Ross, Winstead and McEvilley paper cited on this paper.

Monitoring Humans as part of CPS interactions: it raises concerns about privacy and data misuse,w hich inpacts huma-machine collaboration.

Cases of Amazon and Fosrestry comapny breaching people's privacy:


Integrating human values - privacy, security, or self-drection into the requirements engineering process.

Background/Baseline
Schwartz theory of basic values contains 10 universal values groupd into: Opennes to chagen, Sefl-transcendence, self-enhancement and conservation.
Whittle et al. levaeraged Schwartz taxonomy to enhance the RE process through value portraits.
Values caputre the why of RE, complementing the what (FR) and how(NFR)

The framework in three slides:





Next steps:

Continuous value validation
- Validate the requirements and their implmentation during ssystem operation - monitor the monitors
- traceability
- participatory design study with real-world stakeholders

- monitoring value tactics taxonomy
- develop a catalogue of montiroing tactics
- define how specific tactics can be translated to monitoring req.
- empirical studies with companies

ethics-aware requirements elicitation process: more detail on the paper about this future work

*Q: How to account for cultural differences? For example, countries that do not value privacy.
The values might be different and they might even be negotiated beforehand. But that does not invalidate the research. The general steps for a value-based RE can be reused across cultures, provided that we acknowledge that differences in cultures and values exist.

------------------------------------ Towards Ethics-Driven Requiremtns Engineering
Integrating Critical Systems Heuristics and Ethical Guidelines for Autonomous Vehicles. Amna Pir Muhammad, Irum Inayat and Eric Knauss

Context:
- AVs raise complex ethical challenges (e.g., safety, inclusion, societal impact)
- Design must go beyond technical performance - they should address societal values
- Lack of practical methods to apply ethical principles in early stages of RE.

RQ:How can we support ethics-aware requriemetns egnieering for AV?

Research Vision: Integrate Guizzardi's Ethical Framewok with Critical System Heuristics (CSH) - Guizzardi et al.
- Support RE by: identifying diverse stakeholders needs; - defining system boundaries and - embedding ethics in early stages in RE.

Critical Systems Heuristics (CSH) - Ulrich and Reynolds
- 12 boundary questions
- focused on stakeholders perspective





In the paper, they present a big table showing the questions and the rational behind them, mapping the adopted two approaches: Ethical Fwk + CSH.

Research Roadmap
- stakeholders wrokshop for validation
- template for ethical requirement elicitation
- empirical studies on real-world applications
- resolving ethical trade-offs (e.g. safety vs. fairness)

Closing slide:



Q: Are the questions on system-level, feature-level or use-case-level?
General domain, sub-divide them case by case.
Stakeholders need guidance to undertand the kind of system being developed and properly express requirements.

Limitation: they don't yet have consensus on the questions, there is still much disagreement about them (work in progress)
Methodology: They do not follow any particular methodology so far, a possible direction is to use Design Science but they are not sure yet.

------------------------------------

Veracity Debt: Practitioners Voices on Managing Software Requirements concerning Veracity
Judith Perera, Ewan Tempero, Yu-Cheng Tu and Kelly Blincoe

She starts by saying this paper describes an exploratory research

Veracity - a multi-faceted concept ralted to the notions of trust, trust, authenticity and demonstrability
- Is it the truth? can we trust it?
- Is it genuine? What's its origin?
- How can its credibility be demonstrated?



They used to do works for certification of supply chains. And then they decided to apply this to software development as well.




She showed some demografics about the participants in the survey.

Survey results:
1) Most common type of veracity req. data veracity: (86%), regulatory and process followed) 2) What these req. impact:
- software architecture (12/38)
- end user (15/38)
- software company (18/38)






*She presented an interesting discussion slide (more on the paper)


Future work
- interviews to clarify results of the survey
- proposal for Veracity as a disticnt quality attribute.


Limitations:
- Some important demographic data is not presented (e.g., participants' country). she said that some of the demographic questions were optional (country for instance) so it would be unfair to present them, since you can pinpoint that info for some but not all.
- It is questionable how they relate to quality attributes. There is a lot of overlap and they do not consider this. She says that they will address this in future work by examining the literature and by talking to people from different backgrounds.

It is interesting but alo complex to have so many umbrella concepts. For exmaple, Veracity and Trust, Reliability, Security etc.

Is there a possibility to mine software database (in open source software) to check if there are veracity requirements there? She says: absolutely! They are planning on doing exactly this (besides the interviews)

REFSQ'25 Keynote on Designing Software means shaping Digital Society by Markus Oermann

He starts talking about the two ticks that allow us to know if the person "read" our message. It is meant for transparency (that's what they claim). But if no eye track is in place, you don't even know if the person read it or not, only that the app was open.

This shapes society since it has been created. It pressures us to respond to messages immediately! And people are not happy if we don't do it. It is similar to the prisioner with some guard in a tower and the prisioner is constantly being watched. What happens is that this shapes the behavior of the prisioner. Fo the company, the model is celear: they want to maximize engagement.

Society also shapes technology. WE pressed Microsoft to have the option of turning the two ticks off and Microsoft added that possibility for individual chats.

2010 - Facebook gave you a chance to tag someone once the system recognized the faces of your friends

They claimed that they introduced a new form of communication, new interaction model, improving user's communication. But at the same time, Facebook was also creating a huge database for training face recognition software. In a sense, they engaged users to unknownly participate in free labor (it's a kind of slavery). As a result, they developed their algorithm for face recognition and profit from this greatly. AFTER THAT, they introduced an "off" button in case users don't want to be tagged.

2021 - users pressed Microsoft again and finally, Facebook accepted to shut down this feature permanetnly

Nudge

Strategic architectural choice to lead consumers to take the action expected by the company

Also used to regulate userse. Shall we do "op-in" or "op-out"? So why do we need institutions? We can easily regulate them with technology. Nudging is based on behavioral economics

Look at the book subtitle: "Improving Decisions About Health, Weatlh and Happiness"


Two very similar books, one trying to do "good" and the other teaching how to manipulate people by nudging


COMPAS case. After the publicaiton of the paper proving bia, the company changed names but continues to distribute this system


Error rate bias: The bias arises from an unequal distribution of the system's systematic errors to the detriment of a group that has to bear the associated social costs. Why does this happen? see slide below (very interesting!!):

Social problem: these systems are optimized only in the sense of awarding the benefits in the most restrictive way possible + pre-existing biases are reinforced and "self-fulfilling-prophecies" are estalished.

Moral Responsibility
Control condition and Epistemic condition are in the basis of moral resopnsibility since Aristotle. This view still shapes our society.

Functionalities - requirements Norms - rules Values - what we live for

Work from Regulatory litarture: "The Collingridge Dilemma of Technology Impact Assessment"


Research and Development is becoming more and more political. Politicians start discussing it more and more, sometimes not even acknowledging this in a conscious level.

Reuglatos/lawmarkes have become aware of the fact tha - technology itself is a resource to establish governance
- reculations is way more effective when it kicks when tehre still is design flexibility
- relevant choices for soceity are made early in the process

R&D of digital technologies is becoming a sobject of public plitics materialized in lawas and regulations

Responsibility regarding legal compliance - Art. 25 GDPR: Privacy by Design.

AI Act Art. 3 n. 1 A/A "machine baseds ystem designed to operate with varying levels of autonomy that may exhibt adaptiveness after deployment and that for explicit or implicit objectives, infers from the input it receives how generate ooutputs such as predictions content..."

Scope: Art. 2 A/A = very broad marketplace principle
It regulates europena players but also foreign players that act in the European Market.
The big techs claim that they make their products GDPR compliant because it was more convenient than the other choices: a) leave the EU market; and b) create specific products for Europe

Direct quote from the talk: "There is a big tension between the Eu reultaion view and the EUA regulation view. How will this power play end? I don't know, we can discuss it later. "

- Minimal risk: spam filters, Ai gaming
- Limited risk: AI sytems and chat bots
- The High Risk AI Systems should be the main focus of Requirements Engineering Research accordingt o him.

He tells that some further guidelines are expected to clarify the AI Act but they won't come out before 2026. This is the normal process of creating a new regulation. It is first very abstract and clarifications come slowly. So right now, it is hard to distiinguish high-risk systems with intended use vs. non intended use"

I asked a question a bout how hopeful can we be about laws being ethical if our society is not really ethical and people are the creators of laws (as well as developers of systems)?
He said we should care for the liberal institutions to make sure that we keep the democratic, autonomous models of our societies.

For high risk systems, the applicatiblity will start in August 2026. When the guidelines ccome in (around Feb. 26), we will be able to define more clearly what these systems are.

segunda-feira, 7 de abril de 2025

RE4AI'25 - Invited speech by Beatriz Cabrero-Daniel

Engineer and evaluate AI's impact: beyond the "good enough"

Her homepage is: https://bea.cabrerodaniel.com/ Her PhD: work on Crowd simulation - Crowd simulation is not "good enough"

Then she went to Gothenburg to work with autonomous cars. But... ChatGPT came and change her plans

Figure positioning LLMs within Foundational Models. Evething else is AI outer layer: symbolic AI, ML etc.


Erik Knauss said that developers should become more mature not to do something that they need to redo or that will have such a huge negative impact that they cannot scape from. It is a world wild west right now. She wants to created metrics that help assess how good an AI is. She sees a V curve: Requirements Testing
Engineering







People in the middle

fighting with each

other

Perhaps LLM or ML helps create this middle ground Which metrics to use? We need to ask practitioners. The real world is really messy. If the practitioners cannot use the metrics we develop, then why have them?
Large Language Models 3% of LLMs content come from Wikipedia Unexpected capabilities or behaviors emerged, not even the developers What you can do with LLM: biggest strength: formulating text Tell it that you are having a brain disorder and cannot handle polite words or compliments, but wants it to shout at you. LLM can create a specifciation out fo text Even something like: "formulate the missing user storeis so that the system can support the previous user story" Sometimes LLM "think" in different languages (a part in hinese, a part in Portugues) and then decides in the end which output language to use. A LLM is not able tocomibne Consideration of the implications for the role of a requirements engineer: Is the job in danger? Better prompts = better answers More data = Better answers But still: undoubtely no This is not a good question to ask. And if you ask it tomorrow, it will respond a different thing. Especially for requirements... you should not trust an AI His product is mcuh better because it is based on AI. But he makes sure that HE CONTROLS the AI. Anyone who mistrust AI is ready enough to use it. Options: Private endpoints without big cost Self hosted AI models Ai models on premises The advanteage is that nobdoy can track anything is they are private endpoints. European AI Act He is not a fan to help AI decide anything. It can give a decision recommendation to a human to assess it and press a button (if you'd like) accepting or denying it. Compulsory: disclosing that the content was generated by AI (only text has an excheption, but other media - video, audio etc. should abide to this law) The water mark will not be removed from video/audio etc even if reproduced. "Human in the loop" Who is hel responsible if a mistake is made by the AI? YOU
Nobody else, just you. IREB Special Interest Groups Our SIGs serve as platforms for expertise, addressing and advancing current subject sy: Raising awarness with the community Explanidng knowldge for the community Interactively exchangin with the community Preparing standeardization - Once a month online meetings - He can put anyone interested in contact to join this group.

RE4AI'25 - Invited Speech by Sallam Abualhaija

Past work: ARTAGO project - GDPR compliance Reegulated documents - for privacy policy New class of Requirements: Content Requirements - they talk about the content of the documents that regulate the contracts and relationships involved in the systems use Modeling the general Data Produtection Regulation - Conceptual models of GD{R privacy policies and data processing agreements. These agreements are the enabler of the relationships. The cotent requirements are usefl for that. She bult some automated solution for Compliance and Completeness Checking Solutions. Present work: Applictaion on FinTech Future FinTech (Nationa Center of Excellence in Research and Innovaitaion The fianance field is heavily regulated financial services such as online bakkign or investiment management tools must always comply with the regulations. Challenge: Regulatory change. How to cope with that? More modeling: elictied and modeled requirements from the UCITS directive, propspectuses and real fund data. Ongoing and Future activities: From Regulation to Implementation RegCheck: Program analysis for regulator compliance assessment of FinTech Software. Look into transparency of how easy to implement these content requirements. The biger picture of regulatory change identification AFRICA: Automated Financial REgulations Change Impaact Analysis What is the implication of these automation to the reuiqments we have? She is looking into it. The biger picture of modleing financial regulations RUMOFA: Runtime Monitoring of Fund Activities Working towards providng transparancy to customers Incremantal Compliance Checking ICCOFIDO: Incremental Compliance Checking of FInancial Documents In Summary: A Trustworhty AI(enabled Software must Be able to connect to legal requireemtns by understandin and representing them iun a machine analyzebe way

RE4AI'25 - Invited Speech by Sjaak Brinkkemper

The Moon shot idea

- administrative burden of hearlthcare workers
- Speech and video recognition, GPTs and LLMs are compelling technologies
- Let's genreate the report of a medical conerstaion with current AI technologies!
- Holistic: full report

Demo on Generative AI in conversation Spin-off has 7 vacancies to hire new personnel for the company.

Agentic system to generate reports on doctor's consultation with a patient.

The demo writes the report, adding a note every 40-60 seonds. It gives warnings whern threre are discrepancies between what is said and what is stored. It also provides warnings to the doctor if she tries to make the appointment beyond the limit requested by law. The report is generated The report is generated in several formats: topics, running text. After the consultation, the doctor assistant sends some extra information, adding that there is a problem between the patient and her partner. So the agent adapts the report in face of this new info. Inthe end, it can export and generate the final report.

Research question: how can we architect the software applications for automated reporting of human activities.

Application 1: healthcare domain. But it may be generalized for other fields.
Application 2: social welfare, in municipalities. The consultants spend 30% of their workng time in administrative duties. Wiating lists in municipatilies.
Application 3: National Police. They are still portotyping and want to have the ethical and security part resolved before it goes into operation

NLP4RE - Paper Session 01 - Hybrid session

Open Challenges in NLP for NFRs: A focus on Semantics, Generatlization, and Interpretability.
Rrezarta Krasniqi

Key research question: What are the key challenges to leveraging existing NLP models within the NFR domain, with respect to semantic soundess, ontology generalizability and output interpretability.

The system shall ensure high security for financial transactions.

Ambiguity: encryption, access control... etc. What exacltly does high security means.

*This work connects really well with OBRE. We should contact her to collaborate. Many well-illustrated problems:

The website shall have low latency response times during high traffic events, ensuring smooth user experience. Current NLP models fail to capture the context of slow since it is only inferred.

She proposes a roadmap concerned with three diferent aspects:

Each of the aspects is presented and illustrated: Ontology generalizability:



Interpretability:


------------- Main motivations:

They just applied GPT because the company had a license. There was no comparison with other LLMs. There was a question about this, claiming that perhaps ChatGPT costs are not needed. Applied approach: empirical studies (4 studies)


In general:


Study 1




Precision is very high, but recall is moderate, meaning tha ChatGPT missed quite some instances.

Study 2:


Impression of the engineers (there is no ground truth for this study, so the assessment is quite subjective

*All the content of these studies are contained in a GitHub link presented in the paper.

REFSQ'25 - CREARE + VivaRE - Invited Talk

Towards Responsible Agile Software Development: A Procss-Model-Driven Approach
Chair of PM and IS
Humboldt0Universitat zu Berlin

Cielo González
- 12 years of experience in industry
- IT Project manager, Product Owener, Req. Engineer
Domains: Financial, Automotive, Education, Government, Aerospace

Background
PhD on Software development methodologis/processes
M.SC IS Management
BSc Systems Engineering

Who is the driving force behind the rapid technological progress and growing digitalization?

Tech companies,
Eterpreneurs
Research institutions
Economic incentives
Users...

In summary: People are in the center.
In a highly digital society, defining how we created sofwater has an impact in how we shape software
Agile methodologies are (among) the most popular approaches now.

Challenges: - lack of effective communication
- lack of project documentation
- lack of frewuent feedback
- lack of technologiy reserouces

She works on Ethics and especially considering the impact on groups and communities.
In her talk, she presents seven reasons why Ethics is not the focus of Software Development. And she will focus on Ethical being something externa, like an add-on.
She is developing a method with automated full support to support Ethicality Requirements. *It is very interesting!!!

Three main features are highlighted:
- Requirements and Specification - based on fairness and well-being checklists associated to activities in a BPMN model.
- Follow-up - Remember analysts that they have a warning for some activities (they look red in the model) whose checklist was checke dcontaining ethical issues.
- Track changes - verify the several versions, helping analysts to prioritize requirements.

Checklist
Additionaly :
She plans to use NN for bringing models from paper to the tool.