Titre / Title: Collaboration over a distributed file system

Projet Ciblé (PC) / Targeted Project (PC): PC2 PILOT – Long-term collaboration / Practices and infrastructure for long-term collaboration

Commentaires / Comments:

Etat du sujet / State of the subject: Disponible / Available

Date de publication / Publication date: 13/04/2025

Institution de rattachement / Institutional affiliation: Inria

Résumé / Abstract: File system services are essential for data sharing and collaboration among users. Most of the collaborative file system services such as GoogleDrive and Dropbox rely on a central authority and place personal information in the hands of a single large corporation which is a perceived privacy threat. Users must provide their data to the vendors of these services and trust them to preserve the privacy of their data, but they have little control over the usage of their data after sharing it with other users. Moreover, the centralisation of the platforms hosting these services makes their scalability and reliability very costly. They often limit the number of persons that can simultaneously modify shared data, they generally rely on costly infrastructures and do not allow sharing of infrastructure and administration costs, and centralisation is not suitable for collaboration among a federation of organizations that want to keep control over their data and do not want to store their data at a third party.

A collaborative file system has to support hybrid collaboration including several collaboration modes:

  • connected where user modifications are immediately shared and visible to the other users
  • disconnected where users are not connected to the network. User modifications will be transmitted to the other users at the reconnection
  • ad-hoc collaboration where subgroups of users can work together and synchronise at a later time with other members of the group

Additionally, the collaboration over the file system has to be secure and offer an adapted access control. It should be possible that multiple dynamic administrators can modify users access rights to the shared file system.

We want to build a distributed collaborative file system where control over the data is given to users who can share it directly only with the users they trust and without having to store it at a central authority. The distributed collaborative file system has to support the mentioned collaboration modes and seamless switch from one mode to the others. Additionally, it has to offer a suitable dynamic access control. Data replication algorithms have to be reliable (i.e. after the reception of all modifications the replicas have to converge) and explainable (i.e., the decisions taken by these algorithms have to be understood by users and their intentions have to be respected). These algorithms have to be suitable for a large community of users that produces a large number of modifications with a high frequency.

As data replication mechanism we propose to use CRDTs (Conflict-free Replication Data Types) [1] that respect Strong Eventual Consistency, a property that ensures convergence as soon as every replica has integrated the same modifications without further message exchange among replicas. CRDTs are suitable for end-to-end encryption in a peer-to-peer environment where data will be decrypted only at the receiver side and conflicts can be resolved locally. There is therefore no need to decrypt data during data transmission as it is the case for centralised architectures where servers require un-encrypted data in order to perform merging. There are two main families of CRDTs: state-based and operation-based [1]. They differ in the way payloads are defined, i.e., how the updates are shared. A payload under state-based CRDT contains the whole data, while the payload under operation-based CRDT carries only a single update. In the context of a hybrid collaboration including connected, disconnected and ad-hoc modes, sending the entire document state after each modification would be inefficient. Operation-based CRDTs are more suitable for our targeted use case.

Several works proposed CRDTs for file systems [2,3] or for trees [4,5]. Most of them rely on state-based CRDTs. For the solutions relying on operation-based CRDTs it rests to be investigated whether the proposed merging semantics satisfy user intentions. None of the proposed collaborative file systems offers security mechanisms including access control. In order to avoid the use of a central server that stores access rights, we propose that in addition to the replication of data, access rights are also replicated. We want to propose CRDTs for managing replicated file systems and replicated access control by integrating the solution proposed in [6].

We propose to integrate our proposed solution into MUTE [7], our peer-to-peer collaborative editor. An evaluation with users will be done to test suitable solutions for resolving conflictual changes over the file system and between file system changes and access control rights. The implementation of the proposed solution in MUTE will be also tested with users to evaluate the acceptability of the solution.

Détails du sujet / Subject details: PDF

Directeur / directrice de thèse / Main advisor: Claudia Ignat (claudia.ignat@inria.fr) – COAST team, Centre Inria de l’Université de Lorraine

Encadrant(e) de thèse / Secondary advisor: Gérald Oster (gerald.oster@loria.fr) – COAST team, Université de Lorraine

Autre Encadrant(e) de thèse / Additional advisor:

Pour faire acte de candidature sur ce sujet, veuillez écrire aux auteurs directement / To apply on this subject, please write directly to the authors