Skip to content
  • Monday, June 5, 2023
  • About

Emsi’s feed

  • Home
  • Science
    • Math
    • Medicine
    • Nature
    • Physics
      • Space
  • Tech
    • AI / ML
    • Gaming
    • Hacking
    • Programming
    • Security
  • Other
  • All
  • Home
  • Tech
  • AI / ML
  • Training a 1 Trillion Parameter Model With PyTorch Fully Sharded Data Parallel on AWS
AI / ML

Training a 1 Trillion Parameter Model With PyTorch Fully Sharded Data Parallel on AWS

2022-03-15
Emsi


Linear scaling efficiency is observed when the number of GPUs is increased from 8 GPUs to 512 GPUs.
Read more at Medium…

Share this:

  • Twitter
  • Facebook

Post navigation

Make Extra Money on the Side with Data Science
Neural Networks Intuitions: 13.Out-of-Distribution Detection and HOD Loss — Paper Explanation
June 2023
M T W T F S S
 1234
567891011
12131415161718
19202122232425
2627282930  
« May    

Recent Posts

Space

Betelgeuse is Almost 50% Brighter Than Normal. What’s Going On?

2023-06-03
Emsi
AI / ML

Do You Really Need Reinforcement Learning (RL) in RLHF? A New Stanford Research Proposes DPO (Direct Preference Optimization): A Simple Training Paradigm For Training Language Models From Preferences Without RL

2023-06-03
Emsi
AI / ML

Meta Unveils An AI Model for Code Generation, Comparable to Copilot

2023-06-03
Emsi
AI / ML

Multimodal Web Navigation with Instruction-Finetuned Foundation Models

2023-06-02
Emsi
AI / ML

Guillotine Regularization: Why removing layers is needed to improve…

2023-06-02
Emsi

Categories

  • "Shadow of Artificial Intelligence" (3)
  • AI / ML (342)
  • Culture (37)
  • Emsi (1)
  • Gaming (47)
  • Hacking (68)
  • Math (41)
  • Medicine (125)
  • Nature (91)
  • Other (49)
  • Physics (87)
  • Politics (25)
  • Programming (55)
  • Science (17)
  • Security (136)
  • Space (240)
  • Tech (150)
Copyright © 2023 Emsi’s feed
Theme by: Theme Horse
Proudly Powered by: WordPress