terça-feira, 5 de novembro de 2019

ER 2019 - Tutorial - Data-Driven Requirements Engineering - By Xavier Franch

Tutorial: Data-Driven Requirements Engineering
By Xavier Franch
*My comments are marked with an asterisk

Download the full set of slides

This is a state of the art talk about Data-driven RE.
- What are the main issues
- What is the landscape of solutions

----

What has changed over the years?

- From the seventies - models such as Cocomo (Constructivist Cost Model) to the nighties with Prof. Ricardo Valerdi's take away messages:


----

How to ensures that the system is delivering the right value to  stakeholders?
From one side, the Conceptual Modeling community; on the other side, the Agile Development ideas giving the voice and the power to the stakeholder. This is a dichotomy.

----

From traditional RE...
- interviews
- questionaires
- ethnography
- focus group 
From all these techniques, the requirements of the product emerge

---- 

to data-driven RE (DDRE)

The requirement engineer has now other kinds of artifacts at his disposal:
- repositories of code
- feedback mechanisms
- logfiles that record the real uses of the systems by stakeholders
- information related to project management 
Again, the requirements emerge from the use of such artifacts

----

The proposal is not to get rid of what we have, but to change a bit the focus and use the data that provide real evidence about the use of the system.

----

Toward Data-driven Requirements Engineering -- first paper about it
Walid Maalej and friends.

In the last edition of ICSE, there was an update of this work in a short paper


The Data-driven RE Cycle


----

Research areas:
  1. Explicit Feedback
  2. Implicit Feedback
  3. Combined Explicit and Implicit Feedback
  4. Repository Mining
  5. Decision-making
  6. Processes 

----

Behind the Curtains:
DDRE rely on two kinds of techniques - NLP and ML

  • NLP: processing textual information
  • ML: every context is different


It is important to note that results from one context cannot be applied in another, because the data is different. 
*Moreover, we have to be patient, because in order to get meaning results, we must reconfigure and use different techniques to improve the results, once we get them.

What we can reuse is our knowledge of which techniques to use, and how to improve the results. 

NLP:
A) Preprocessing: tokenization (at the level of sentences and words), stemming/lemamatization (lemmatization is similar to stemming, but more acurate), phrasing (part-of-speech tagging).



----

1) Explicit Feedback 

Gathering, analysing and summarizing feedback given by the user

Three processes to be supported:
  • Feedback Gathering
  • Feedback Analysis
  • Feedback Summarization
As Preprocessing, clustering is also a supporting technique for all Explicit Feedback processes.

Most of the effort in DDRE is on Explicit Feedback

----

Feedback Gathering

  • Communication style: pusch vs. pull
  • Mode: from linguistic to multi-modal
  • Channel: app stores, forum, social media etc.
  • Advanced: feedback of feedback (to inform about the feedback approach itself)
Each of these kinds of communication style, mode and channel have different features which may be advantageous or disadvantageous


----

Feedback Analysis
  • Categorization - bug reports or feature request.
  • Sentiment Analysis - if the feelings are good, not good, or both
  • Topic Modeling - this is more bottom up (clustering) 
----

Categorization

Sometimes, the border between a bug and a feature... There is a famous sentence, which is also the title of a paper: It is a feature, not a bug

Categorization can be more elaborated (see definitions in the slide)

Noise
Unclear
Unrelated 


Problem: No single classifier works best for all review types and data sources. 


----

Sentiment Analysis

Process of assigning a quantitative value to a piece of text expressing an affect or mood. 



Deep learning for sentiment analysis: A survey
Lei Zhang et al. 

Of course, it is tricky, e.g.: "Great, I love this new feature that gives me this wonderful headache" Machines do not recognize irony. 

----

Topic Modeling

  • Identifying topic that best describes a corpus (usually latent, i.e. emerge during the process)
  • Each corpus described by a distribution of topics (each topic is described by a distribution of words)
  • Most popular algorithm: LDA
Problems of LDA:
  • instability problems (order effect: sentences in different order give different result)
  • fails to capture rich topical correlations

No Clustering needed in LDA

----

Scaling Problem: 

Some of these NLP techniques in general have scalability problems. While they work really well with small samples, they do not with large samples. 


----

Summarization

Summary of what has been found out in any of the supporting processes for Explicit Feedback.

----

Problems with Explicit Feedback

- Motivation of the user to provide feedback
- Reliability of the results
- Privacy - how to deal with sensible data
- Reputation

----

2) Implicit Feedback

Getting feedback from the user without her involvement

Two main instruments:

  • Monitoring infra
  • Log files

These two instruments may be combined

Interesting paper: Monitoring the service-based system lifecycle with SALMon

In domains of IoT, this is a basic technique: monitoring will provide valuable  information on how to update the network to better serve the user needs.

----

Importance of context

3LConOnt: a three level ontology for context modeling in context-aware computing.
*Here, modeling is finally required!


  • Time
  • Location
  • User Profile 
  • etc.



One of the problems is not being able to discover context at design time. Thus, ML may come to our rescue once more.
E.g. of reference
ACon: learning-based approach to deal with uncertainty in contextual requirements at runtime
Knauss and friends

It is a very challenging field in RE!

----

Usage Log

Information about usage of the system
What can be discovered:





----

3) Combining Explicit and Implicit Feedback

Xavi's group did this by using a Domain Ontology they created for Explicit and Implicit Feedback

They also have some work on Crowd-based RE

----

4) Repository Mining

There are some features that may only be discovered by looking at internal properties of the products.

Mine repositories to find out what kind of requirement must be included in the system.

----

Quality Models



The difference between the QM of the 70s and 80s and the ones of today is that the ones today are taylored to the organization while the early ones were universal models. The main point is that even if there is business value in a specific metric, you must use only the ones for which you have data for. Otherwise, you simply cannot measure it.

----

Decision Making Tools for RE

Code Analytic Tool
Good for developers, but not for Requirements Engineers.

So, the idea is to have Strategic Dashboards, where more strategic information may help the decision maker.

Simulation capabilities may also serve well to analyse the impact of different choices in the metrics (which may also come from Quality Models).

----

Decision Making need to involve relevant stakeholders

Gamification Approaches may help motivate the stakeholders

----

Liquid Democracy

<slide here>

---

DDRE in context

How can practitioners use this information and integrate it into their processes and tools to decide about what should be done?
- This is considered one of the three biggest challenges in the area.

----

QRapids proposes a cycle integrating different stakeholders and processes and using a Dashboard and Mined Data in order to generate requirements.

How can Quality Awareness Support Rapid Software Development  - A Research Preview
(work connected to his QRapids project, which has just finished)

----

QRapids Challenges to adoption

  • Tailoring to the company
  • Integration with the company WoW (Way of Working)
  • Shared vocabulary
*Ontologies may be useful here for the third topic.


Value is taken from:

  • Informativeness (how informative is the method)
  • Transparency (information needs to be connected to the data and we must allow users to go until the bit of data that produced the decision)
Paper: Continuously Assessing and Improving Software Quality with Software Analytic Tools: A Case Study

----

Lessons Larned

  • Incremental adoption
  • Monitor progress with strategic indicators
  • Involve experts 

Values:
  • Transparency as a business value
  • Tailoring to different scopes
  • Technological value: single access point to software quality related data.

Paper: Continuously Assessing and Improving Software Quality with Software Analytic Tools: A Case Study (same paper is above)

----

Online Controlled Experimentation involving stakeholders.

Two great papers: 
- Experimentation Growth Evolving trustworthy A/B testing capabilities in online software companies.
- Raising the Odds of Success: The current state of experimentation in product development.

----

Conclusions

  • DDRE provides a great opportunity to deliver value because it is based on evidence
  • But it is not a hammer for every nail. You need good data, good techniques etc. (DDRE - needs data!)
  • Still traditional methods at least to start with
  • The role of traditional RE in the loop is a matter of debate
He showed a couple of slides showing things that have been said in this conference and that match what he is saying: at Jarke's keynote and Storey's keynote


Um comentário: