Best practices für die Entwicklung

In Abgrenzung zu den Software-Anforderungen wollte ich nochmal best practices sammeln, die bei der Entwicklung berücksichtigt werden können. Viele davon sind geradezu selbstverständlich für Software-Entwicklung, durch die Explikation können wir sie aber vielleicht auch für andere Prozesse nutzen, in denen etwas “entwickelt” wird. (1) bis (5) der Trivia mache ich z. B. auch bei jedem größeren Text, den ich im Studium schreibe.

Trivia

  1. use a version control system (agree on a workflow (Centralized / Feature branching / Gitflow / Forking) and make good / Conventional Commits)
  2. make a README
  3. keep a changelog (e. g. with Standard Version or chan)
  4. create meaningful versions (e. g. Semantic Versioning or Calendar Versioning)
  5. choose a license
  6. write down Architectural Decision Records
  7. write down your SOPs
  8. write a documentation (Diátaxis Framework)
  9. automate everything: code linting, builds, tests, deployment
  10. make it reproducible: describe environment (packages versions etc.), write runbooks, make builds reproducible and bootstrappable

Ich halte es auch deswegen für sinnvoll, dies bereits jetzt zu sammeln, da manche dieser Punkte feststehen sollten, sobald mehr Menschen mitentwickeln. Bzgl. Punkt (5) verlangen manche Softwareprojekte beispielsweise für nicht-triviale Änderungen ein Developer Certificate of Origin (per Konvention kann dieses schnell und einfach mit der git-Option --signoff unterschrieben werden). Ein sog. Contributor Agreement darüber hinaus ist mir allerdings eher unsympathisch und kann vielleicht auch durch wenige Sätze in der Projektbeschreibung ersetzt werden. Ein anderes Beispiel ist der Collective Code Construction Contract von ZeroMQ (mehr Kontext dazu hier). Deren Sammlung von API-Schnittstellen, Protokollen und Prozessen finde ich eh spitze. Für die API-Entwicklung scheint sich die OpenAPI-Spezifikation durchgesetzt zu haben, für die es direkt einen Generator gibt.

Handbücher

Speziell auf Software-Projekte bezogen gibt es einige weitere best practice-Sammlungen:

Muster und Checklisten

Hier will ich insbesondere auf die Privacy Patterns verweisen, eine Mustersammlung, um einige wiederkehrende Herausforderungen in der Softwareentwicklung bzgl. des Schutzes der Privatsphäre zu meistern.

Ich habe mal noch die FLOSS Best Practices Criteria der Core Infrastructure Initiative ergänzt.

Und eben bin ich auf die Literatur-Sammlung “Free and Open Source Governance” gestoßen und habe sie ergänzt.

Ich habe die Mustersammlung privacypatterns.org / privacypatterns.eu durchgearbeitet und die (für mich) interessantesten Muster hier zusammengeschrieben. Sie gliedern sich in die folgenden Kategorien (nach Jaap-Henk Hoepman: “Privacy Design Strategies”):

  • Inform
  • Control
  • Minimize
  • Hide
  • Enforce

@balkansalat könnte das interessant finden; da recht technisch aber vermutlich optional für @Marcus / @Marcus_temp :slight_smile:

Inform

Minimal Information Asymmetry

Prevent users from being disenfranchised by their lack of familiarity with the policies, potential risks, and their agency within processing.

Limit the amount of data needed to provide the services necessary to the users, and where appropriate, prefer less sensitive data to do so. Give users the option to opt in to features which require more data, but keep it minimal by default. If the amount of data needed is minimized, then users have less they need to understand, and less to disagree with.

Informed Consent for Web-based Transactions

This pattern describes how controllers can inform users whenever they intend to collect or otherwise use a user’s personal data.

  • Disclosure of purpose specification and limitation
  • Agreement and disagreement capabilities
  • Comprehension through easily understandable, comprehensive and concise explanations
  • Voluntariness showing that consent is freely-given
  • Competence to make reasonable legally binding decisions
  • Minimal Distraction which may otherwise aggravate the user

Asynchronous notice

Proactively provide continual, recurring notice to consented users of repeating access to their personal data, including tracking, storage, or redistribution.

Whenever there is a context switch, sufficient duration, or random spot check, provide users with a simple reminder that they have consented to specific processing.

Awareness Feed

Users need to be informed about how visible data about them is, and what may be derived from that data. This allows them to reconsider what they are comfortable about sharing, and take action if desired.

  • Impactful Information and Feedback: Provide feedback about who a user will disclose their information to using certain privacy settings before that information is actually published.
  • Increasing awareness of information aggregation: Inform users about the potentially identifying effects of information aggregation to prevent them from unknowingly endangering their privacy.
  • Privacy Awareness Panel: Establish user awareness of the risks inherent in the disclosure of their data, whether to the controller themselves or to other users.
  • Appropriate Privacy Feedback: Supplies the user with privacy feedback, especially concerning that which is monitored and accessed, and by whom.
  • Who’s Listening: Inform users of content where other users or unauthenticated persons having accessed the same content are listed, and may access any further disclosures.

Privacy Mirrors

Disclosure awareness is needed to adequately manage digital identity. Provide the user of a system with a high level reflection on what personal data the system knows about, what access is given to others, and what kind of personal data can be deduced.

  • history of data flows: The past must be summarized in a way which is easy to understand, but still detailed enough to identify less obvious risks.
  • feedback regarding the state of their physical, social, and technical environment: There needs to be a way to disseminate this history, state, and flow information to the users without inducing notification fatigue and without exposing information which is not contextually acceptable.
  • awareness enabled by the feedback: This concept includes the user’s knowledge about how they feature in the system, how others feature with regards to the user’s personal data, as well as what capabilities and constraints entities are given.
    • Social: Notable usage patterns on access, being able to correlate this with others and encourage better decisions.
    • Technical: Understanding the limitations of the system, and the capabilities if used correctly, to use the system more effectively. Users should understand the flow, state, and history of their personal data in the system.
    • Physical: Having regard for the repercussions of their physical state, including location, being perceivable by the system.
  • accountability from these: When personal data is accessed, it should be clear who did so, and when – to both the person concerned and the one doing the accessing.
  • change enacted by the users: having the means to share less or more than others is important.

Personal Data Table

In order for users to see what information a controller has about them, they can be provided with a detailed tabular overview of that data upon request.

A table that shows the overview. The overview could show:

  • Which data
  • Why collected
  • How used/for which purpose collected
  • Who has access to the data
  • Who the user authorized for access
  • Which consent the user has given for specific data
  • To which parties the data is disclosed
  • Who has seen the data
  • Whether the data can be hidden
  • Whether the data can be removed
  • How long the data is stored
  • How datasets are combined to create richer (privacy sensitive) information. Note that this may violate local laws and regulations
    − With which other information the data is combined

Control

Reasonable Level of Control

Let users share selectively (push) and make available (pull) specific information to predefined groups or individuals.

Users should be able to push their chosen information to (or have it pulled by) those they grant access. Using push mechanisms, users will have the greatest level of control due to the fact that they can decide the privacy level of their data case by case.

Pull mechanisms are less granular, as granting access to a group or individual continues until that access is denied. Within this time frame, the sensitivity of the data may fluctuate. However, the user should have the ability to retract access at will, and thus, can manage their own identified risks.

Personal Data Store

Subjects keep control on their personal data that are stored on a personal device.

A solution consists in combining a central server and secure personal tokens. Personal tokens, which can take the form of USB keys, embed a database system, a local web server and a certificate for their authentication by the central server. Data subjects can decide on the status of their data and, depending on their level of sensitivity, choose to record them exclusively on their personal token or to have them replicated on the central server.

Decoupling [content] and location information visibility

Allow users to retroactively configure privacy for location information with respect to the content’s contextual privacy requirements.

Allow users to retroactively decide upon the disclosure of location information with respect to the context of the particular content. Record location information by default, but do not automatically share it.

Masquerade

Let users filter out some or all personal information they would otherwise provide to a service.

Allow users to select their desired identifiability for the context in question. They may reveal some subset of the interaction or account attributes and filter out the rest. […]

Two approaches could be considered: levels of publicity or publicity profiles.

In levels of publicity, all possibly revealed information could be arranged on a scale depending on how identifying each kind of information is alone or when shown together. A visual element could be used to select a specific publicity level. When the users select one level, all information with the same or smaller publicity level will be revealed. […]

In publicity profiles, all possibly revealed information could be depicted using visual elements and the users have to select each kind of information that they want to reveal. Furthermore, depending on the kind of information, the users could define different granularity for each one (E.g. regarding location it is possible to define the country, region, city, department and so on).

[Support] Selective Disclosure

Many services (or products) require the collection of a fixed, often large, amount of personal data before users can use them. Many users, instead, want to freely choose what information they share. This pattern recommends that services Support Selective Disclosure, tailoring functionality to work with the level of data the user feels comfortable sharing.

  • Anonymous Usage: At one extent it may be possible to benefit from the system anonymously, though whether this is feasible will depend on the level of abuse anonymous usage might attract.
  • Assumption of Modesty: Where users choose to register, it should not be assumed that they wish to use all of the system’s services.
  • The Right to Reconsider: User decisions should be amendable.
  • Look Before You Leap: In situations where there are requirements for personal data, particularly when strict, users should be aware of this prior to their consent.

Private link

Enable sharing and re-sharing without wide public visibility or cumbersome authenticated access control.

Provide the user a private link or unguessable URL for a particular resource, such as a set of their personal information (e.g. their current location, an album of photos). Anyone who has the link may access the information, but the link is not posted publicly or guessable by an unintended recipient. The user can share the private link with friends, family or other trusted contacts who can in turn forward the link to others who will be able to access it, without any account authentication or access control lists.

Minimize

User data confinement pattern

Avoid the central collection of personal data by shifting some amount of the processing of personal data to the user-trusted environments (e.g. their own devices). Allow users to control the exact data that shares with service providers

[…] instead of having the customer trust the service provide to protect its personal data, the service provider now haves to trust the customers’ processing.

Strip Invisible Metadata

Strip potentially sensitive metadata that isn’t directly visible to the end user.

Stripping all metadata that is not directly visible during upload time, or during the use of the service can help protect services from leaks and liabilities. […]

Additionally when users share data with services, they can be presented with a preview of the data obtained by the service, including any metadata […].

Location Granularity

Support minimization of data collection and distribution. Important when a service is collecting location data from or about a user, or transmitting location data about a user to a third-party.

Since much geographic data inherently has different levels of precision – like street, city, county, state, country – there may be natural divisions in the precision of location data. By collecting or distributing only the necessary level of granularity, a service may be able to maintain the same functionality without requesting or distributing potentially sensitive data.

Added-noise measurement obfuscation

Add some noise to service operation measurements, but make it cancel itself in the long-term

A noise value is added to the true, measured value before it is transmitted to the service provider, so as to obfuscate it. The noise abides by a previously known distribution, so that the best estimation for the result of adding several measurements can be computed, while an adversary would not be able to infer the real value of any individual measurement. Note that the noise needs not be either additive or Gaussian. In fact, these may not be useful for privacy-oriented obfuscation. Scaling noise and additive Laplacian noise have proved more useful for privacy preservation.

Hide

Pseudonymous Messaging

A messaging service is enhanced by using a trusted third party to exchange the identifiers of the communication partners by pseudonyms.

A message is send by a user to the server, which exchanges the sender’s address with a pseudonym. Replied messages are sent back to the pseudonymous address, which will then be swapped back to the original.

Use of dummies

This pattern hides the actions taken by a user by adding fake actions that are indistinguishable from real.

This pattern is applicable when it is not possible to avoid executing, delaying or obfuscating the content of an action. […]

[…] simultaneously perform other actions in such a way that the adversary cannot distinguish real and fake (often called dummy) actions.

Anonymity Set

This pattern aggregates multiple entities into a set, such that they cannot be distinguished anymore.

There are multiple ways to apply this pattern. One possibility is, to strip away any distinguishing features from the entities. If we do not have enough entities, such that the anonymity set would be too small, then we could even insert fake identities.

Enforce

Identity Federation Do Not Track Pattern

The Do Not Track Pattern makes sure that neither the Identity Provider nor the Identity Broker can learn the relationship between the user and the Service Providers the user us.

Include an orchestrator component, that must act in behalf and be controlled by the user. The orchestrator makes sure that the identity broker can’t correlate the original request from the service provider with the assertions that are returned from the identity provider. The correlation can only be done within the orchestrator but that’s no issue because this acts on behalf of the user, possibly on the device of the user.

Sticky Policies

Machine-readable policies are sticked to data to define allowed usage and obligations as it travels across multiple parties, enabling users to improve control over their personal information.

Service providers use an obligation management system. Obligation management handles information lifecycle management based on individual preferences and organisational policies. The obligation management system manipulates data over time, ensuring data minimization, deletion and notifications to data subjects.

Ebenfalls interessant ist das Forschungsprojekt “Usable Security und Privacy by Design”, das Prinzipien, Richtlinien und Muster für die Planung und Implementierung von Sicherheitsfunktionen bereitstellt, sowie Checklisten für die Auswahl von Software und einiges mehr. Ich habe beim Durchlesen nicht so coole Anregungen bekommen wie beim Lesen der Privacy Patterns, aber sie haben eine schöne Visualisierung der Verbindungen zwischen ihren Mustern erstellt:

patterns

Ich habe einige Guides von Mozilla ergänzt:

Falls euch die Namen abschrecken, guckt trotzdem mal rein; ich habe viel Nützliches gefunden, als ich den Entwurf für unsere “Infos für Einsteiger*innen” geschrieben habe.

TODO guides

These Open Source Guides are developed by the TODO Group in collaboration with The Linux Foundation and the larger open source community. They collect best practices from the leading companies engaged in open source development, and aim to help your organization successfully implement and run an open source program office. We expect these guides to be living documents that evolve via community contributions.