DeAI: decentralized artificial intelligence
  • Introduction
    • General Terminology
  • Landscape
    • Data Providers
    • Computing Power
    • Model Training Task
    • Challenges
  • Privacy Preservation
    • Data Process
    • Privacy Preserved Training
    • Federated Learning
    • Cryptographic Computation
      • Homomorphic encryption
      • Multi-Party Computation
      • Trusted Execution Environment
    • Challenges
  • Security
    • Data Poisoning
    • Model Poisoning
    • Sybil Attacks
    • Impact of Large Models
    • Responsibility
  • Incentive mechanisms
    • Problem Formulation
    • Contribution Evaluation
    • Copyright
  • Verification of Computation
    • Computation on Smart Contract
    • Zero-Knowledge Proof
    • Blockchain Audit
    • Consensus Protocol
  • Network Scalability
    • Local Updating
    • Cryptography Protocol
    • Distribution Topology
    • Compression
    • Parameter-Efficient Fine Tuning
  • Conclusion
Powered by GitBook
On this page
  1. Privacy Preservation

Data Process

Data processing stands as a natural and effective strategy to mitigate privacy leakage during the preparation of pretraining corpora and finetuning instruction datasets. This process involves masking or filtering personally identifiable information (PII) and other sensitive data. However, such measures can inadvertently reduce diversity and lead to information loss, potentially weakening the capabilities of Large Language Models (LLMs).

Moreover, while data anonymization techniques like k-anonymity, l-diversity, and t-closeness can be employed to eliminate privacy information from data, they may not fully protect against membership inference attacks, as signatures other than PII may still identify data providers.

Additionally, deduplication, although a simple and effective method, can improve model quality while mitigating privacy risks associated with training data extraction and membership inference attacks .

PreviousPrivacy PreservationNextPrivacy Preserved Training

Last updated 11 months ago