Definitions (The text below is more an introduction on the DI and its significance/potential)
Distributed Identity refers to a particular method of verifying one’s identity and personal attributes to a relying party.
Distributed Identity is mainly used for accessing online services but there is no theoretical obstacle for presenting a digital identity in a non-virtual setting like at physical entry at a gate.
Distributed Identity is most useful at first-time access to a service, like at registration, because it allows for the user - the Holder - to share verifiable attestations and claims about their identity and attributes which may have been self-asserted or granted. These, in turn, should indeed be verified by the relying party. The relying party may then store some of the information gathered in a local account and issue local credentials, in which case in later interactions distributed identity plays a lesser role. However, it may also be the case that the claims are not stored, or that they need to be updated or maintained, and then Distributed Identity plays an equal role in subsequent interactions as in the first one.
The attestations and claims are granted to a person by a community or authority or by herself. These are stored by the user on their device, typically in a wallet application or browser extension and then presented to the relying party when accessing the service.
An advantage of distributed identity is that the user decides whether and what to release at any point. Their ability to control attribute release improves privacy and data protection.
The verification could use a digital infrastructure like digital signature verification on a signed piece of data or consultation with a registry or a distributed ledger.
Motivation
There are several advantages to be gained from a Distributed Identity system in Research and Education (DI4R) setting:
- Better attribute aggregation: in a DI4R setting, attribute aggregation happens within the user wallet. This enables attributes from more sources with the user in perfect control of the release.
- Easier integration for the identity and service (are both meant here?) providers: Providers need not to federate - they can decide to provide or consume user information or stop doing that at any time and it is only up to the user whether they want to provide attributes or not. (overlap with the later "SP is responsible for asking for only what it needs...")
- No tracking by IdP: In a SAML or OIDC setting, the Identity Provider can track in real-time where its users are logging in. In DI4R, the issuer cannot track any subsequent usage of the issued information and thus learn about the user's behaviour.
- Easier compliance with GDPR:
- The user holds control over cards and can easily delete them.
- For the IdP there is not much difference, except in terms of less control - the IdP cannot know/limit what happens with the credentials once issued - they can only track their inclusion into the user's wallet (including after a claim has expired or is revoked)
- The IdP's ability to control attribute release improves privacy and data protection.
- Not having a proxy! (in the long run? we go to lengths below with proxies!) is also a big advantage.
- The authorisation is decoupled from providing attributes.
- The service is responsible for asking for only what it needs and trusts to and is responsible for claims regarding verification, authorization and GDPR-complied handling of released information.
- Easier in the ecosystem to exchange information without top-level trust route approval - basically mesh-like federation.
- Explanation: we came up with tagging in eduGAIN so that we don’t break the trust model, while entities can still express additional content.
Work Done
From Sprint Demo 4.6 - September 21/22:
- Implement and improve IRMA issuer in SimpleSAMLphp
Test verification of claims from multiple schemes
Explore the best way to describe the scheme
Discuss IRMA ‘metadata’ distribution risks
Investigate assurance
Device assurance
Expressing assurance from source
Investigate revocation
Multi-valued attributes
Bellow are all (unclassified) features, achievements, findings, issues, todos and questions from Sprint Demo 4.6 - September 21/22 conclusions - keep in 3 or check against / move to 8, 2 or 10 (Future work); some items need extra elaboration:
- IRMA does improve end-user control over attributes [BMa: already mentioned]
Tracking behaviour is indeed impossible [already mentioned]
Is the app helpful or do we need to simplify GUI? [kinda already mentioned in 10 but not as simplification - "Enhanced presentation of cards"]
Issuer chaining is still untested
Per claim revocability (untested) [already mentioned]
No fallback for the mobile app at this time
No central infrastructure collects all user data [already mentioned]
- Not having a proxy reduces the administrative and legal burden [already mentioned]
- Once claims are issued, the Issuer is no longer involved, this improves scalability [kinda already mentioned]
- What is the legal/GDPR model, as ‘consent’ is not applicable
- Use of app adds to improved LoA
- LoA enhancing is much easier because of the mobile platform
- Service (and user!) can cherry-pick claims; unused data is not sent [already mentioned]
- A distributed Identity model may provide a more flexible ecosystem, while it can still have similar trust properties as we have with eduGAIN [already mentioned]
- Does an app provide us with better control over our ecosystem?
Functional Model
Here provided comparative overviews illustrate the transition toward distributed identities.
Sourcing of Claims
IRMA and Privacy-by-Design Federation
There is a Distributed Identity solution provider already in use in the EU, mainly in the Netherlands: the I Release My Attributes (IRMA) by Privacy By Design foundation.
Since the components of that system are available on GitHub, and several key components are already Licenced with Apache 2.0, it is natural that we opted to experiment with that.
Therefore, from this point on, we specifically refer to IRMA as our prospective DI4R solution.
The source code of the system is available at:
https://github.com/privacybydesign
Technical Model
How does verification actually work in IRMA?
https://irma.app/docs/overview/
Source: IRMA documentation: https://irma.app/docs/what-is-irma/#irma-session-flow
Software components:
- Requestor backend and frontend: Generally the requestor runs a website with a (JavaScript) frontend in the user's browser and a backend server. During an IRMA session, the front end displays the IRMA QR that the IRMA app scans. All frontend tasks depicted in the diagram are supported by irma-frontend.
- IRMA server: Handles IRMA protocol with the IRMA app for the requestor.
- IRMA mobile app: Android, iOS.
Explanation of the steps:
- Usually, the session starts with the user performing some action on the website (e.g. clicking on "Log in with IRMA").
- The requestor sends its session request (containing the attributes to be disclosed or issued, or message to be signed) to the IRMA server. Depending on its configuration, the IRMA server accepts the session request only if the session request is authentic (e.g. a validly signed session request JWT) from an authorized requestor.
- The IRMA server accepts the request and assigns a session token (a random string) to it. It returns the contents of the QR code that the frontend must display: the URL to itself and the session token.
- The frontend (irma-frontend) receives and displays the QR code, which is scanned by the IRMA app.
- The IRMA app requests the session request from step 1, receiving the attributes to be disclosed or issued, or message to be signed.
- The IRMA server returns the session request.
- The IRMA app displays the attributes to be disclosed or issued, or the message to be signed, and asks the user if she wants to proceed.
- The user accepts.
- The IRMA server performs the IRMA protocol with the IRMA app, issuing new attributes to the user, or receiving and verifying attributes from the user's IRMA app, or receiving and verifying an attribute-based signature made by the user's app.
- The session status (DONE, CANCELLED, TIMEOUT), along with disclosed and verified attributes or signatures depending on the session type, are returned to the requestor.
Use of Proxies
DI4R in Proxy Approach is a logical extension of the available data sources for a service, for which the multi-protocol proxy was created in the first place.
In this arrangement, the sole source of all information is the Proxy from the SP's point of view.
The IRMA-to-SAML proxy allows for logging on to SAML SPs with IRMA cards.
The arrangement works the other way around too: SAML-to-IRMA proxy provides the possibility of using a SAML federated account to get IRMA cards.
The next two figures illustrate the internal structure of a deployment.
Idemix
IRMA implements the Idemix Protocol to handle pseudonymous attribute handling. The Idemix protocol provides a way for users to use verifiable pseudonymous identifiers, coupled with certain attributes at services, without revealing their identity. With Idemix, this includes non-traceability across services, by the virtue of the user having credentials targeted to each service.
The credentials are always issued by an Issuer Organization. The User contacts the Issuer and registers an account and establishes a pseudonym. If the user is eligible for certain attributes, a credential will be issued containing the pseudonym and the attributes to the user.
Then, the user can prove the possession of such credentials without actually revealing them to a verifier organization.
Use Cases (Generalized)
The following use case descriptions present some ideas of how the system may be used in an academic setting.
Issuer: SAML Attributes into IRMA Tokens
An obvious source of "cards" is a SAML federation. In order for a SAML attribute of a user to be converted to a card, the user needs to visit an entity that acts as a proxy. This proxy needs to behave as a SAML SP towards the user and the SAML federation. The user needs to visit the site with the intent of adding a card to their IRMA app so that the IRMA infrastructure can store the data as a card. The user will be logged in to this SAML SP which will consume the attributes from an IdP / AA then store them to the IRMA infrastructure.
Issuer: 'Native' Triple Stack IdP Issuing SAML, OIDC and IRMA
An authentication source may already have to support multiple protocols, (for instance, SAML and OIDC) in order to cater for the modern web environment. A logical extension of this idea is to support an additional protocol, the Card Issuer (is it how it is called, or 'IRMA card issuer protocol'?).
Issuer: Attribute Aggregation from Research AAI/MMS
In a traditional SAML flow, the following happens. The goal is to enable user Aladár (A) to manage the authorisation of user Béla (B) authorization to service S, but in a way that this information is not maintained in S but in an external source, the Membership Management Service (MMS).
- A logs in on the web interface of the MMS, a SAML SP and an account is created.
- A creates a Virtual Organization / Community / Group - terminology depends on the actual tool but let's call it (VO)
- A wants to invite B to his VO. In order to do this, he needs an email address to B. This email address serves as a trust anchor for the moment, therefore it needs to really belong to B and not be compromised.
- A sends an email invitation to B with a link containing a token. The email is sent by the MMS system.
- B follows the link to the web interface of the MMS, prompted for login. B may already have a login (for previous participation in other VOs) or needs to create a new one. B may log in with a federated account but it could be the case that there is none, and a local account is created or a VHO account is used. This scenario is made possible by the fact that really the access to the email inbox is what provides the trust for the VO membership.
- After creating/accessing a local account, the token sent in the link is processed and B's account is now associated with the VO.
- B will eventually access a service that needs this membership information, commonly called entitlement.
- The service will perform a login flow
- with B's user identifier queries the MMS back-end, for instance, a SAML AA or an integration. This requires the usage of the same user identifier that was used at the MMS, typically a common OIDC/SAML source.
- A may revoke the entitlement at any time, which will take effect at the next session: the service accessed will query the MMS and will not get the entitlement.
With the introduction of DI4R, the flow may be significantly simplified.
- A creates a VO at the MMS service
- A sends an invitation to B to the VO
- an email is sent to B from the MMS
- a card is registered to the registry?
- B visits the link and receives the card, which is added to the wallet.
- B visits a service that needs the entitlement and presents the card.
- (the card is verified in the common registry, therefore revocation is possible
With this solution, B does not have to use the same login (i.e. the MMS and the target S do not need to be in the same federation). Possibly, B can receive the card at a page maintained by the DI4R provider.
Or, perhaps the DI4R provider's web interface serves as a landing page for the invitation:
IRMA Proxy as attribute aggregator
Issuer: Journal Use Cases
In the academic peer review process, honest opinions from an expert in the field are crucial. There is an inevitable tendency for specialization in science because any modern problems can only be tackled in focused, career-long efforts, so in most subdisciplines, the researchers will have a tendency of knowing each other. This, however, presents a challenge for the review process. In order to overcome the challenge, in the most widely used review processes, a degree of anonymity is introduced.
- The "Single Blind" process is considered to be a minimum requirement - in this case, the author does not learn the identity of the reviewer. For most journals, this is considered insufficient, since the reviewers still know the identity of the author and they may be biased in one way or the other. Yet, in some cases, especially in less common language there is no true alternative as the content of the article drastically narrows down the set of possible authors, sometimes to one. In these cases the more anonymous methods are disingenuous.
- The "Double Blind" process means that neither the authors learn the identity of the reviewers or the reviewers of the authors. This is the most common type of peer review process. But it still leaves significant control in the hands of the editor, who knows the identity of both, plus, due to the structure of the fields of science, she may personally know all parties and have their own interest. The editor may also know the review styles of particular reviewers based on previous engagements. Therefore it is possible to pick a lenient or a strict reviewer for a given paper for instance.
- The Triple Blind method prevents this problem as the identities of the author, editor and reviewer are unknown to each other. However, this is the hardest to implements, as the editor still needs to be sure about the expertise of the reviewer, moreover, she should also know that the author does not temper with the process by being its own reviewer or bringing in friendly reviewers. At this point, the scientific process becomes somewhat analogous with e-voting systems.
- Furthermore, all three types of blind reviews have a common problem, which is that the work of the reviewer cannot be easily credited to them. This disincentivizes the reviewers from participating and therefore is a drawback for the entire scientific process.
In order to overcome these challenges, an editorial system could issue certificates for editing, reviewing and acceptance.
- The certificate of acceptance will contain the name of the author and the metadata of the article, therefore this can be handled as a simple card.
- The certificate of editing will also contain the name of the author and the edited issue, making it very similar to the certificate of acceptance.
- The certificate of review should also be connected to the person who did the review but it should not reveal what the review entailed. The way for doing that is to be in a large enough set of people so that the k-anonymity is sufficiently high. Otherwise, based on the exact timing and the fields of interest of a reviewer an author might be able to guess who did their review. Therefore, only larger time ranges (e.g. Year) should be revealed. Smaller journals may want to pool themselves together and issue a certificate that only says that the review was done in one of the journals in question.
Virtual Home Organization
The Virtual Home Organization use case helps users wanting to access research & education infrastructure without having a home organization that is technically enabled with the accessed services on a technical level. While the technical integration is missing, the user may have a completely valid claim on access. In these cases, a virtual home organization (VHO) is used. In this description, we present the sponsored VHO use case, in which one user (within the technical collaboration) sponsors another (outside the collaboration) by an invitation.
- User A (the sponsor already in collaboration)sends an email invitation to user B (outside the technical collaboration).
- A describes B on a form at the VHO and inputs the email address of B
- the email is sent out.
- User B visits the VHO service and receives a card that describes their identity as stated by A
- B visits the service, and reviews the data stated about her by A, and receives a card
- The card gets registered to the registry
- User B can now access services within the collaboration.
- B attempts to access the service
- the service verifies the card in the registry and allows access.
Verifier: Consume Holder's Credentials
Any entity that normally relies on an authentication flow that also aggregates attributes may use IRMA or another service for login. In this process, the user is challenged with a QR Code to brandish attributes with the help of the wallet app. The wallet app reads the QR code and engages in user interaction: it shows what is requested by the service and which "cards" - previously-stored attributes accommodate the request if any. Alternatively, in this flow, the user may acquire new cards to fulfil the request. The wallet then sends the attributes to the service, which can verify them with a background call.
With this method, the Verifier no longer trusts an IdP (something that is exposed on the public internet) but trusts the authentication and the possession of the wallet. Arguably, this provides the opportunity for a stronger level of assurance (i.e. two factors to the wallet+possession of the device).
Issues to Address and Discussion
Assurance
Since many sources can provide IRMA attributes, the IRMA platform does not standardise levels of assurance beyond individual profiles. Assurance levels are provided by using the corresponding schema-defined credential attributes, that is, IRMA passes on the level of assurance provided by sources only if these levels are incorporated into the used schema and implemented by the IRMA issuer.
For example, attribute “assurancelevel” is used in schemas that provide data from passports or ID cards, and it conveys the levels set by the document issuer or an intermediate entity that collected and verified the information provided with the credential. This level is in line with eIDAS. Some other schemes use “digidlevel” to provide the level from the Duch Digital ID (digid.nl), which is the assurance with which identity is verified in the Dutch population register.
The user may select what credentials from available they want to present to the verifier. The verifier can determine which attributes it does or does not accept from which sources. It can also state the required attribute bundles by using IRMA "Condiscons" (CONjunction of DISjunctions of CONjunctions), which allow verifiers to specify attributes sets coming from a single credential instance. With this, a service can require a composition of alternative bundles of attributes, even if they are using different schemes to provide the relevant data and corresponding LoAs. However, the use of a consistent attribute schema and semantics of levels may greatly simplify this selection, along with a mechanism informing verifiers about trustworthy issuers participating in such a schema.
In support of assurance, the IRMA platform allows defining the optional validity period of credential at its issuance; if skipped, a default value of 6 months is assigned. The validity is always rounded down to the nearest week.
Another important supported mechanism is revocation which is described in more detail in the corresponding section.
Multi-Factor Authentication
When it comes to traditional authentication sessions, the need for separate authentication factors for high-stakes sessions has been long acknowledged.
Login names and passwords may fall prey to a hacked browser or operating system, in which case an independent channel - a one-time password challenge, a push notification on a secondary device, paper-based factors or even something as weak as an interceptable SMS can provide a crucial last line of defence.
Most of the currently trending second-factor options assume the primary channel to be a desktop or laptop computer and the mobile as a secondary device.
However, IRMA is a mobile-only application, and, obviously, no user will have a second mobile device that may act as an independent source of authentication.
This is a drawback as no independent channel is provided for the user's access to their information. The information is stored on the user device, defended by a PIN, which, should the mobile operating system get exposed, will also be compromised, and the private information can be stolen.
At the same time, many factors alleviate this concern. For one, the mobile operating systems, while not being invulnerable, are much less exposed to malware according to statistics (TODO REF), probably due to their more controlled software/package management. This of course deteriorates once the operating system is no longer supported.
Another important factor to consider is the very nature of the distributed identity system - there are no huge amounts of sensitive data on any one device, instead, all the devices store their owner’s data only, making them much less valuable targets.
Moreover, the client devices normally don’t provide server functionality at all, that is, they are usually not addressable under a well-known DNS name, and there are no ports exposed.
Besides all this, the mobile device is much more personal than the desktop or laptop device and is much better tied to the user, also, by the virtue of providing the primary means of personal communication, a mobile device is much sooner reported and their connection remotely deactivated when stolen or lost.
This shows that the lack of a second-factor option in the case of a mobile wallet may be compensated by other factors.
Alternative Wallets
With any popular system that relies on a particular type of device there comes a point where the question arises: how to cater for those potential users who do not own the right kind of device?
An alternative wallet in this context means a non-smartphone based implementation of a wallet. While having the ability to use alternative wallets seems a necessity as the user base grows, a non-smartphone based implementation comes with several challenges.
One such challenge is the QR-Code reading for which the smartphone is especially well suited, but is not impossible in another architecture either, i.e. a browser extension. Another challenge is the sage (SAFE??) storage of the ‘cards’, but that also can be done
Schema and Multi-Valued Attributes
TODO
Attribute Revocation
Revocation is enabled per credential type in the IRMA scheme. If so, the properly configured issuer’s IRMA will issue revocation-enabled credentials of that type. If the user has a revocation-enabled credential then proving non-revocation is not required; instead, they can just disclose attributes from the credential, which is much cheaper. Non-revocation is still ensured by using revocation update messages which are created whenever an issuer performs a revocation, which also distributes issuer-related information that is updated at the time of revocation and is necessary to disclose attributes.
During attribute disclosures, IRMA can prove non-revocation, but only if explicitly asked for by the requestor. The reason for this is that computing a non-revocation proof for a credential is much more expensive than just computing a disclosure proof out of that credential. For this, IRMA will only prove non-revocation for a credential type if the requestor explicitly requests it. Requestors should only request non-revocation proofs when it is really necessary for them to establish that they received non-revoked attributes.
Additional Source: https://irma.app/docs/revocation
Development Work and Demos
IRMA Issuer Setup
IRMA issuer consists of a small PHP server that relies on simpleSAMLphp for authentication. In the case of success, this call results in a populated attributes array that is then fed into the IRM daemon session request API for an issuance session and the result is handed over to the JavaScript handlers. The Javascript then requests the IRMA daemon using the result of the issuance session request and shows the result.
TODO figure
IRMA Verifier Setup
The IRMA verifier is based on the simpleSAMLphp framework and implemented as an authsource. The authsource shows a web form and creates a disclosure session request using the IRMA daemon API. The result of this request is then handed over to the Javascript handler and on receiving the successful disclosure response, the form is POST’ed back to the simpleSAMLphp authIRMA handler and further processed as a valid authentication.
TODO figure
Future Work
- Multi-valued attributes
- Alternative wallets
- Scalable schema definition for a size of a federation like eduGAIN (of issuers)
- 5k or more entities
- Peer-to-peer claims (cards)
- Pixie dusting - claiming that someone is your co-worker, club member, etc.
- Conventions on prefixes for wildcards used on attribute names
- Use of multiple schemas and schema selector
- UX is not hard but needs to be done well
- allows for different universes of DI4R
- Enhanced presentation of cards
- Since the user is in charge of exposing cards in their wallet to the service, it is important to present these cards, their content and their source in a clear but informative way. This requires further establishing of a standardised and scalable way to specify their presentation and access to supporting information which is also interoperable with existing identity infrastructures and trust frameworks.
- Usability testing/evaluation
- DI4R is a new concept, so it is a reasonable question whether the users understand the flow at all and the benefits that justify the adoption of changes. Should be done with appropriate early adopters such as researchers involved in Open Science.
Leftover Diagrams and Material