MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models
present MiniGPT-4, which aligns a frozen visual encoder with a frozen LLM, Vicuna, using just one projection layer
paper: https://github.com/Vision-CAIR/MiniGPT-4/blob/main/MiniGPT_4.pdf
Whisper JAX
This repository contains optimised JAX code for OpenAI's Whisper Model, largely built on the 🤗 Hugging Face Transformers Whisper implementation. Compared to OpenAI's PyTorch code, Whisper JAX runs over 70x faster, making it the fastest Whisper implementation available.
github: https://github.com/sanchit-gandhi/whisper-jax
demo: https://huggingface.co/spaces/sanchit-gandhi/whisper-jax
🐶 Bark
Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying.
github: https://github.com/suno-ai/bark
StableLM: Stability AI Language Models
Stability released initial set of StableLM-alpha models, with 3B and 7B parameters. 15B and 30B models are on the way. Base models are released under CC BY-SA-4.0.
github: https://github.com/Stability-AI/StableLM
demo: https://huggingface.co/spaces/stabilityai/stablelm-tuned-alpha-chat
🌋 LLaVA: Large Language and Vision Assistant
LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.
github: https://github.com/haotian-liu/LLaVA
demo: https://llava.hliu.cc/