What data was used to train Sora?
We used publicly available data and licensed data.
So, videos on YouTube?
I'm actually not sure about that.
OK, videos from Facebook? Instagram?
You know if they were publicly available, um yeah, publicly available to use there might be the data but I'm not sure. I'm not confident about it.
What about Shutterstock? I know you guys have a deal with them.
I'm just not gonna go into the details of the data that was used but it was publicly available or licensed data.
EDIT: Please help, can't figure out how preserve line breaks.
Edit: Improved it a bit.
A line ending (not in a code span or HTML tag) that is preceded by two or more spaces and does not occur at the end of a block is parsed as a hard line break