Sep 13, 2024
It's a multi-modal model, which is supposed to learn and predic images and texts simultaneously. Multi-modal models can eventually be used for text-image translation and vice versa.
It's a multi-modal model, which is supposed to learn and predic images and texts simultaneously. Multi-modal models can eventually be used for text-image translation and vice versa.