It's a multi-modal model, which is supposed to learn and predic images and texts simultaneously.

What makes it want to return images? Is this some sort of attempt at senses association?
50
1
Micky Kayn
Mengliu Zhao
·Follow
Sep 13, 2024
--
It's a multi-modal model, which is supposed to learn and predic images and texts simultaneously. Multi-modal models can eventually be used for text-image translation and vice versa.
--
--
Written by Mengliu Zhao247 Followers
·63 Following
Wish I can pay and get more humbleness
No responses yet
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams