
---license: Apache License 2.0 tasks:- Multimodal Models---BLIP-2:使用冻结图像编码器和大型语言模型的语言图像预训练,在VQA,Caption等多图文任务上表现出性能优势。该模型为参数量2.7b模型#### Clone with HTTP在个人中心->模型->我的模型,查询访问令牌。可以通过令牌进行git仓库的使用。```bash git clone http://git.aistudio.baidu.com/aistudio/blip2-caption-opt2.7b.git```#### 快速开始安装 [PaddleMIX](https://github.com/PaddlePaddle/PaddleMIX)```bashmport osimport requestsimport numpy as npimport paddlefrom PIL import Imagefrom paddlemix.models.blip2.modeling import Blip2ForConditionalGenerationfrom paddlemix.models.blip2.utils import create_tokenizer, load_modelfrom paddlemix.processors.blip_processing import ( Blip2Processor, BlipImageProcessor, BlipTextProcessor,)url = "http://images.cocodataset.org/val2017/000000039769.jpg"image = Image.open(requests.get(url, stream=True).raw)prompt = "describe the image"model_name_or_path = "aistudio/blip2-caption-opt2.7b"text_model_name_or_path = "aistudio/opt-2.7b"tokenizer_class = create_tokenizer(text_model_name_or_path,from_aistudio=True)image_processor = BlipImageProcessor.from_pretrained( model_name_or_path, subfolder = os.path.join("processor", "eval"), from_aistudio=True)text_processor_class = BlipTextProcessor.from_pretrained( model_name_or_path, subfolder = os.path.join("processor", "eval"), from_aistudio=True)text_processor_class.prompt = ""processor = Blip2Processor(image_processor, text_processor_class, tokenizer_class)inputs = processor( images=image, text=prompt, return_tensors="pd", return_attention_mask=True, mode="test",)model = Blip2ForConditionalGeneration.from_pretrained(model_name_or_path,from_aistudio=True)decorated = paddle.amp.decorate( models=[model.visual_encoder, model.language_model], optimizers=None, level="O2")model.visual_encoder, model.language_model = decoratedmodel.eval()generated_ids, scores = model.generate(**inputs)generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()print(generated_text)```训练与微调请参考[BLIP-2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/blip2)## 相关论文以及引用信息<pre>@inproceedings{li2023blip2, title={{BLIP-2:} Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models}, author={Junnan Li and Dongxu Li and Silvio Savarese and Steven Hoi}, year={2023}, booktitle={ICML},}</pre>