Hacker News Re-Imagined

YaLM-100B: Pretrained language model with 100B parameters

  • 736 points
  • 12 days ago

  • @f311a
  • Created a post

YaLM-100B: Pretrained language model with 100B parameters


@londons_explore 12 days

Replying to @f311a 🎙

For those of us without 200GB of GPU RAM available... How possible is it to do inference loading it from SSD?

Would you have to scan through all 200GB of data once per character generated? That doesn't actually sound too painful - 1 minute per character seems kinda okay.

And I guess you can easily do lots of data parallelism, so you can get 1 minute per character on lots of inputs and outputs at the same time.

Reply


@MetroWind 11 days

Replying to @f311a 🎙

Looking at the comments, I just don't think you 1st world people know what censorship and propaganda look like...

Reply


@egorfine 12 days

Replying to @f311a 🎙

I have huge respect for developers at Yandex. It's kind of sad that achievements like these are tainted by the fact that they come from Russia (and I speak as a Ukrainian). I wonder if the permissive license is able to mitigate that.

Reply


@throwaway4good 12 days

Replying to @f311a 🎙

What sort of machine can run this model?

Reply


@SXX 12 days

Replying to @f311a 🎙

First of all regardless for political situation this is great step in making ML research actually open. So huge thanks for those developers who pushed to make it public. Still...

Yandex is in fact share responsibility for Russian government actions. While it impossible to fight censorship they could certainly shut down their News service completely.

Yandex could also certainly move more of their company and staff out of country. It was their deliberate choice stay in Russia and getting advantages on local market by using their political weight.

Reply


@idealmedtech 12 days

Replying to @f311a 🎙

In the download script, it skips parts of the model (02 and 83); any ML people have ideas why you'd do that?

Reply


@wongarsu 12 days

Replying to @f311a 🎙

Now we just need someone to figure out how to compress the model to get similar performance in 10B parameters.

I assume some of the services that offer GPT-J APIs will pick this up, but it doesn't look cheap or easy to get this running.

Reply


@alexb_ 12 days

Replying to @f311a 🎙

I have to wonder if 10 years down the line, everyone will be able to run models like this on their own computers. Have to wonder what the knock-on effects of that will be, especially if the models improve drastically. With so much of our social lives being moved online, if we have the easy ability to create fake lives of fake people one has to wonder what's real and what isn't.

Maybe the dead internet theory will really come true; at least, in some sense of it. https://www.theatlantic.com/technology/archive/2021/08/dead-...

Reply


@lostmsu 12 days

Replying to @f311a 🎙

I downloaded the weights and made a .torrent file (also a magnet link, see raw README.md). Can somebody else who downloaded the files as well doublecheck the checksums?

https://github.com/lostmsu/YaLM-100B/tree/Torrent

Reply


@lukestateson 12 days

Replying to @f311a 🎙

1. Yandex supports the Russian Terrorist regime.

2. Yandex News service ignores the genocide currently happening in Ukraine.

3. Yandex Search engine hides the pictures of Bucha and Irpin massacre as well as Kharkiv and Mariupol destruction.

Yandex using whitewashing tactics via open source.

Reply


@htrp 12 days

Replying to @f311a 🎙

> It was tested on 4 (A100 80g) and 8 (V100 32g) GPUs, but is able to work with different configurations with ≈200GB of GPU memory in total which divide weight dimensions correctly (e.g. 16, 64, 128).

so we looking at crazy prices just for inference. RIP to the first guy's cloud billing account who makes this public

Reply


@ketzu 12 days

Replying to @f311a 🎙

Seeing those gigantic models it makes me sad that even the 4090 is supposed to stay at 24GB of RAM max. I really would like to be able to run/experiment on larger models at home.

Reply


@justinzollars 12 days

Replying to @f311a 🎙

What is the TLDR on this model? What exactly does it do? Its not clear from the source examples.

Reply


@braingenious 12 days

Replying to @f311a 🎙

This is one of the funniest threads I’ve ever seen on this website. People are yelling at eachother about the CIA and the legitimacy of Israel and Assange and the definition of fascism and… anything that pisses anybody off about international politics in general. In a thread about a piece of software that’s (to me and likely many others) prohibitively expensive to play around with.

Anyway I hope somebody creates a playground with this so I can make a computer write a fan fiction about Kirby and Solid Snake trying to raise a human baby on a yacht in the Caspian Sea or whatever other thing people will actually use this for.

Reply


@option 12 days

Replying to @f311a 🎙

Did they bias it toward ru propaganda talking points?

Edit: I would like to see more details in addition to size and languages (en, ru) about training data. For example, did they use their own Yandex.news (a cesspool of propoganda)?

Reply


@m00dy 12 days

Replying to @f311a 🎙

well, I can call this "the real open ai".

Reply


@obituary_latte 12 days

Replying to @f311a 🎙

What are some use cases for something like this? I understand it says "generating and processing text", but is it a replacement for OCR? Or something else?

Reply


@londons_explore 12 days

Replying to @f311a 🎙

The download fails because the vocab file link returns HTTP 403... :-(

https://yalm-100b.s3.yandex.net/vocab/voc_100b.sp

EDIT: It seems fine if you download with a browser useragent not CURL... I guess I just got hit by some anti-bot thing they have accidentally have turned on.

Reply


@sandGorgon 12 days

Replying to @f311a 🎙

is this the first GPT-like models which is fully opensource ? none of the others are right ?

Reply


@manishsharan 12 days

Replying to @f311a 🎙

Is there a way for developers, who do not have AI/ML background, to get started using this ? I have been curious about GPT-3 but I do not have any AI/ML experience or knowledge. Is there a "approachable" course on Coursera or Udemy that could help me get started with technologies like GPT ?

Reply


@amai 12 days

Replying to @f311a 🎙

Is that the model used by the russian government to generate fake news?

Reply


@edf13 12 days

Replying to @f311a 🎙

Wonder what the split is between Russian and English in the model?

Reply


@ma2rten 12 days

Replying to @f311a 🎙

I am one of the people who worked on Google's PaLM model.

Having skimmed the GitHub readme and medium article, this announcement seems to be very focused on the number of parameters and engineering challenges scaling the model, but it does not contain any details about the model, training (learning rate schedules, etc.), or data composition.

It is great that more models are getting released publicly, but I would not get excited about it before some evaluations have been published. Having a lot of parameters should not be a goal in and of itself. For all we know this model is not well trained and worse than Eleuther AI's 20B parameter model, while also being inconveniently large.

Reply


@MichaelRazum 12 days

Replying to @f311a 🎙

It's just crazy how much it costs to train such models. As I undestand 800 A100 cards would cost about 25.000.000 without considering the energy costs for 61 days of training.

Reply


@narrator 12 days

Replying to @f311a 🎙

I love Yandex. They are the best search engine by far for politically controversial topics. They also release a language model to benefit everyone even if it says politically incorrect stuff. They also name their projects "cocaine" probably to perhaps to prevent western competitors from using them.

You look at OpenAI and how they don't release their models mainly because they fear "bad people" will use them for "bad stuff." This is the trend in the west. Technology is too powerful, we must control it! Russia is like... Hey, we are the bad guys you're talking about so who are we keeping this technology from? The west has bigger language models than we do, so who cares. Also their attitude to copyright and patents, etc. They don't care because that's not how their economy makes money. Cory Doctorow's end of general purpose computing[1] and locked down everything is very fast approaching. I'm glad the Russians are around and aren't very interested in that project.

[1]https://csclub.uwaterloo.ca/resources/tech-talks/cory-doctor...

Reply


@joshsyn 12 days

Replying to @f311a 🎙

Yandex > Google.

Reply


@lumost 12 days

Replying to @f311a 🎙

To add a voice of skepticism. The recent rush to open source these models may be indicative that the tens of millions that’s spent training these things has relatively poor roi. There may be a hope that someone else figures out how to make these commercially useful.

Reply


@pembrook 12 days

Replying to @f311a 🎙

Side note: Yandex search is awesome, and I really hope they stay alive forever. It's the only functional image search nowadays, after our Google overlords neutered their own product out of fear over lawyers/regulation and a disdain for power users.

You can't even search for images "before:date" in Google anymore.

Reply


@upupandup 12 days

Replying to @f311a 🎙

Does anybody want to crowd fund the training?

Reply


@schizo89 12 days

Replying to @f311a 🎙

I hope one day it will be possible to run this kind of models at home.

Reply


@kome 12 days

Replying to @f311a 🎙

I agree, yandex is a great search engine

Reply


About Us

site design / logo © 2022 Box Piper