Build a Milvus Powered Text-Image Search Engine in Minutes

1,223次阅读

This notebook illustrates how to build an text-image search engine from scratch using Milvus. Milvus is the most advanced open-source vector database built for AI applications and supports nearest neighbor embedding search across tens of millions of entries. We’ll go through text-image search procedures and evaluate the performance. Moreover, we managed to make the core functionality as simple as a dozen lines of code, with which you can start hacking your own image search engine.

Preparation

Install Dependencies

First we need to install dependencies such as towhee, gradio and opencv-python.

In [1]:! python -m pip install -q towhee gradio opencv-python

Prepare the data

The dataset used in this demo is a subset of the ImageNet dataset (100 classes, 10 images for each class), and the dataset is available via Github.

The dataset is organized as follows:

train: directory of candidate images;
test: directory of test images;
reverse_image_search.csv: a csv file containing an id, path, and label for each image;

Let’s take a quick look:

In [2]:! curl -L https://github.com/towhee-io/examples/releases/download/data/reverse_image_search.zip -O ! unzip -q -o reverse_image_search.zip % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 –:–:– 0:00:01 –:–:– 0 100 119M 100 119M 0 0 8071k 0 0:00:15 0:00:15 –:–:– 10.6M

In [3]:import pandas as pd df = pd.read_csv(‘reverse_image_search.csv’) df.head()

Out[3]:

	id	path	label
0	0	./train/brain_coral/n01917289_1783.JPEG	brain_coral
1	1	./train/brain_coral/n01917289_4317.JPEG	brain_coral
2	2	./train/brain_coral/n01917289_765.JPEG	brain_coral
3	3	./train/brain_coral/n01917289_1079.JPEG	brain_coral
4	4	./train/brain_coral/n01917289_2484.JPEG	brain_coral

To use the dataset for text-image search, let’s first define some helper function:

read_images(results): read images by image IDs;

In [4]:import cv2 from towhee.types.image import Image id_img = df.set_index(‘id’)[‘path’].to_dict() def read_images(results): imgs = [] for re in results: path = id_img[re.id] imgs.append(Image(cv2.imread(path), ‘BGR’)) return imgs

Create a Milvus Collection

Before getting started, please make sure that you have started a Milvus service. This notebook uses milvus 2.2.10 and pymilvus 2.2.11.

In [ ]:! python -m pip install -q pymilvus==2.2.11

Let’s first create a text_image_search collection that uses the L2 distance metric and an IVF_FLAT index.

In [5]:from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility def create_milvus_collection(collection_name, dim): connections.connect(host=‘127.0.0.1’, port=‘19530’) if utility.has_collection(collection_name): utility.drop_collection(collection_name) fields = [ FieldSchema(name=‘id’, dtype=DataType.INT64, descrition=‘ids’, is_primary=True, auto_id=False), FieldSchema(name=’embedding’, dtype=DataType.FLOAT_VECTOR, descrition=’embedding vectors’, dim=dim) ] schema = CollectionSchema(fields=fields, description=‘text image search’) collection = Collection(name=collection_name, schema=schema) # create IVF_FLAT index for collection. index_params = { ‘metric_type’:’L2′, ‘index_type’:”IVF_FLAT”, ‘params’:{“nlist”:512} } collection.create_index(field_name=“embedding”, index_params=index_params) return collection collection = create_milvus_collection(‘text_image_search’, 512)

Text Image Search

In this section, we’ll show how to build our text-image search engine using Milvus. The basic idea behind our text-image search is the extract embeddings from images and texts using a deep neural network and compare the embeddings with those stored in Milvus.

We use Towhee, a machine learning framework that allows for creating data processing pipelines, and it also provides predefined operators which implement insert and query operation in Milvus.

Generate image and text embeddings with CLIP

This operator extracts features for image or text with CLIP which can generate embeddings for text and image by jointly training an image encoder and text encoder to maximize the cosine similarity.

In [6]:from towhee import ops, pipe, DataCollection import numpy as np

In [7]:p = ( pipe.input(‘path’) .map(‘path’, ‘img’, ops.image_decode.cv2(‘rgb’)) .map(‘img’, ‘vec’, ops.image_text_embedding.clip(model_name=‘clip_vit_base_patch16’, modality=‘image’)) .map(‘vec’, ‘vec’, lambda x: x / np.linalg.norm(x)) .output(‘img’, ‘vec’) ) DataCollection(p(‘./teddy.png’)).show() 2023-06-12 14:54:53,081 – 140406106779840 – connectionpool.py-connectionpool:1003 – DEBUG: Starting new HTTPS connection (1): huggingface.co:443 2023-06-12 14:54:54,529 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/config.json HTTP/1.1” 200 0 2023-06-12 14:54:56,725 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/vocab.json HTTP/1.1” 200 0 2023-06-12 14:54:57,253 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/preprocessor_config.json HTTP/1.1” 200 0 2023-06-12 14:54:57,544 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/tokenizer_config.json HTTP/1.1” 200 0 2023-06-12 14:54:57,626 – 140397301311232 – node.py-node:167 – INFO: Begin to run Node-_input 2023-06-12 14:54:57,629 – 140397301311232 – node.py-node:167 – INFO: Begin to run Node-image-decode/cv2-0 2023-06-12 14:54:57,630 – 140397292918528 – node.py-node:167 – INFO: Begin to run Node-image-text-embedding/clip-1 2023-06-12 14:54:57,631 – 140397284525824 – node.py-node:167 – INFO: Begin to run Node-lambda-2 2023-06-12 14:54:57,633 – 140397276133120 – node.py-node:167 – INFO: Begin to run Node-_output 2023-06-12 14:54:57,814 – 140406106779840 – __init__.py-__init__:307 – DEBUG: matplotlib data path: /home/junjie.jiangjjj/anaconda3/envs/py39/lib/python3.9/site-packages/matplotlib/mpl-data 2023-06-12 14:54:57,821 – 140406106779840 – __init__.py-__init__:307 – DEBUG: CONFIGDIR=/home/junjie.jiangjjj/.config/matplotlib 2023-06-12 14:54:57,823 – 140406106779840 – __init__.py-__init__:1475 – DEBUG: interactive is False 2023-06-12 14:54:57,824 – 140406106779840 – __init__.py-__init__:1476 – DEBUG: platform is linux 2023-06-12 14:54:57,877 – 140406106779840 – __init__.py-__init__:307 – DEBUG: CACHEDIR=/home/junjie.jiangjjj/.cache/matplotlib 2023-06-12 14:54:57,880 – 140406106779840 – font_manager.py-font_manager:1540 – DEBUG: Using fontManager instance from /home/junjie.jiangjjj/.cache/matplotlib/fontlist-v330.json 2023-06-12 14:54:58,069 – 140406106779840 – pyplot.py-pyplot:339 – DEBUG: Loaded backend module://matplotlib_inline.backend_inline version unknown. 2023-06-12 14:54:58,070 – 140406106779840 – pyplot.py-pyplot:339 – DEBUG: Loaded backend module://matplotlib_inline.backend_inline version unknown.

img	vec
	[0.037240546, -0.065988705, -0.010860455, …] shape=(512,)

In [8]:p2 = ( pipe.input(‘text’) .map(‘text’, ‘vec’, ops.image_text_embedding.clip(model_name=‘clip_vit_base_patch16’, modality=‘text’)) .map(‘vec’, ‘vec’, lambda x: x / np.linalg.norm(x)) .output(‘text’, ‘vec’) ) DataCollection(p2(“A teddybear on a skateboard in Times Square.”)).show() 2023-06-12 14:55:03,300 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/config.json HTTP/1.1” 200 0 2023-06-12 14:55:05,445 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/vocab.json HTTP/1.1” 200 0 2023-06-12 14:55:05,830 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/preprocessor_config.json HTTP/1.1” 200 0 2023-06-12 14:55:06,123 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/tokenizer_config.json HTTP/1.1” 200 0 2023-06-12 14:55:06,205 – 140396121286400 – node.py-node:167 – INFO: Begin to run Node-_input 2023-06-12 14:55:06,207 – 140396121286400 – node.py-node:167 – INFO: Begin to run Node-image-text-embedding/clip-0 2023-06-12 14:55:06,211 – 140394427840256 – node.py-node:167 – INFO: Begin to run Node-lambda-1 2023-06-12 14:55:06,211 – 140394419447552 – node.py-node:167 – INFO: Begin to run Node-_output

text	vec
A teddybear on a skateboard in Times Square.	[-0.0086854, 0.02717687, -0.0007425508, …] shape=(512,)

Here is detailed explanation of the code:

map('path', 'img', ops.image_decode.cv2_rgb('rgb')): for each row from the data, read and decode the image at path and put the pixel data into column img;
map('img', 'vec', ops.image_text_embedding.clip(model_name='clip_vit_base_patch16',modality='image'/'text'): extract image or text embedding feature with ops.image_text_embedding.clip, an operator from the Towhee hub . This operator supports seveal models including clip_vit_base_patch16,clip_vit_base_patch32,clip_vit_large_patch14,clip_vit_large_patch14_336,etc.

Load Image Embeddings into Milvus

We first extract embeddings from images with clip_vit_base_patch16 model and insert the embeddings into Milvus for indexing. Towhee provides a method-chaining style API so that users can assemble a data processing pipeline with operators.

In [18]:%%time collection = create_milvus_collection(‘text_image_search’, 512) def read_csv(csv_path, encoding=‘utf-8-sig’): import csv with open(csv_path, ‘r’, encoding=encoding) as f: data = csv.DictReader(f) for line in data: yield int(line[‘id’]), line[‘path’] p3 = ( pipe.input(‘csv_file’) .flat_map(‘csv_file’, (‘id’, ‘path’), read_csv) .map(‘path’, ‘img’, ops.image_decode.cv2(‘rgb’)) .map(‘img’, ‘vec’, ops.image_text_embedding.clip(model_name=‘clip_vit_base_patch16’, modality=‘image’, device=0)) .map(‘vec’, ‘vec’, lambda x: x / np.linalg.norm(x)) .map((‘id’, ‘vec’), (), ops.ann_insert.milvus_client(host=‘127.0.0.1’, port=‘19530’, collection_name=‘text_image_search’)) .output() ) ret = p3(‘reverse_image_search.csv’) 2023-06-12 16:16:47,531 – 140406106779840 – connectionpool.py-connectionpool:273 – DEBUG: Resetting dropped connection: huggingface.co 2023-06-12 16:16:48,943 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/config.json HTTP/1.1” 200 0 2023-06-12 16:16:51,909 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/vocab.json HTTP/1.1” 200 0 2023-06-12 16:16:52,316 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/preprocessor_config.json HTTP/1.1” 200 0 2023-06-12 16:16:52,605 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/tokenizer_config.json HTTP/1.1” 200 0 2023-06-12 16:16:52,733 – 140381920413440 – node.py-node:167 – INFO: Begin to run Node-_input 2023-06-12 16:16:52,748 – 140381912020736 – node.py-node:167 – INFO: Begin to run Node-read_csv-0 2023-06-12 16:16:52,750 – 140402136966912 – node.py-node:167 – INFO: Begin to run Node-image-decode/cv2-1 2023-06-12 16:16:52,750 – 140381920413440 – node.py-node:167 – INFO: Begin to run Node-image-text-embedding/clip-2 2023-06-12 16:16:52,752 – 140386089555712 – node.py-node:167 – INFO: Begin to run Node-lambda-3 2023-06-12 16:16:52,753 – 140393303758592 – node.py-node:167 – INFO: Begin to run Node-ann-insert/milvus-client-4 2023-06-12 16:16:52,755 – 140393043715840 – node.py-node:167 – INFO: Begin to run Node-_output CPU times: user 1min 5s, sys: 46.7 s, total: 1min 52s Wall time: 28.6 s

In [11]:collection.load()

In [12]:print(‘Total number of inserted data is {}.’.format(collection.num_entities)) Total number of inserted data is 0.

Query Matched Images from Milvus

Now that embeddings for candidate images have been inserted into Milvus, we can query across it for nearest neighbors. Again, we use Towhee to load the input Text, compute an embedding vector, and use the vector as a query for Milvus. Because Milvus only outputs image IDs and distance values, we provide a read_images function to get the original image based on IDs and display.

In [13]:import pandas as pd import cv2 def read_image(image_ids): df = pd.read_csv(‘reverse_image_search.csv’) id_img = df.set_index(‘id’)[‘path’].to_dict() imgs = [] decode = ops.image_decode.cv2(‘rgb’) for image_id in image_ids: path = id_img[image_id] imgs.append(decode(path)) return imgs p4 = ( pipe.input(‘text’) .map(‘text’, ‘vec’, ops.image_text_embedding.clip(model_name=‘clip_vit_base_patch16’, modality=‘text’)) .map(‘vec’, ‘vec’, lambda x: x / np.linalg.norm(x)) .map(‘vec’, ‘result’, ops.ann_search.milvus_client(host=‘127.0.0.1’, port=‘19530’, collection_name=‘text_image_search’, limit=5)) .map(‘result’, ‘image_ids’, lambda x: [item[0] for item in x]) .map(‘image_ids’, ‘images’, read_image) .output(‘text’, ‘images’) ) DataCollection(p4(“A white dog”)).show() DataCollection(p4(“A black dog”)).show() 2023-06-12 14:56:15,172 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/config.json HTTP/1.1” 200 0 2023-06-12 14:56:17,239 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/vocab.json HTTP/1.1” 200 0 2023-06-12 14:56:17,616 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/preprocessor_config.json HTTP/1.1” 200 0 2023-06-12 14:56:17,946 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/tokenizer_config.json HTTP/1.1” 200 0 2023-06-12 14:56:18,058 – 140385418467072 – node.py-node:167 – INFO: Begin to run Node-_input 2023-06-12 14:56:18,059 – 140385391195904 – node.py-node:167 – INFO: Begin to run Node-image-text-embedding/clip-0 2023-06-12 14:56:18,061 – 140385382803200 – node.py-node:167 – INFO: Begin to run Node-lambda-1 2023-06-12 14:56:18,061 – 140384248522496 – node.py-node:167 – INFO: Begin to run Node-ann-search/milvus-client-2 2023-06-12 14:56:18,062 – 140385418467072 – node.py-node:167 – INFO: Begin to run Node-lambda-3 2023-06-12 14:56:18,063 – 140384240129792 – node.py-node:167 – INFO: Begin to run Node-read_image-4 2023-06-12 14:56:18,065 – 140383858194176 – node.py-node:167 – INFO: Begin to run Node-_output

text	images
A white dog

2023-06-12 14:56:18,246 – 140393312151296 – node.py-node:167 – INFO: Begin to run Node-_input 2023-06-12 14:56:18,247 – 140385391195904 – node.py-node:167 – INFO: Begin to run Node-image-text-embedding/clip-0 2023-06-12 14:56:18,247 – 140385382803200 – node.py-node:167 – INFO: Begin to run Node-lambda-1 2023-06-12 14:56:18,247 – 140384248522496 – node.py-node:167 – INFO: Begin to run Node-ann-search/milvus-client-2 2023-06-12 14:56:18,248 – 140385418467072 – node.py-node:167 – INFO: Begin to run Node-lambda-3 2023-06-12 14:56:18,248 – 140384240129792 – node.py-node:167 – INFO: Begin to run Node-read_image-4 2023-06-12 14:56:18,248 – 140383858194176 – node.py-node:167 – INFO: Begin to run Node-_output

text	images
A black dog

Release a Showcase

We’ve done an excellent job on the core functionality of our text-image search engine. Now it’s time to build a showcase with interface. Gradio is a great tool for building demos. With Gradio, we simply need to wrap the data processing pipeline via a search_in_milvus function:

In [20]:search_pipeline = ( pipe.input(‘text’) .map(‘text’, ‘vec’, ops.image_text_embedding.clip(model_name=‘clip_vit_base_patch16’, modality=‘text’)) .map(‘vec’, ‘vec’, lambda x: x / np.linalg.norm(x)) .map(‘vec’, ‘result’, ops.ann_search.milvus_client(host=‘127.0.0.1’, port=‘19530’, collection_name=‘text_image_search’, limit=5)) .map(‘result’, ‘image_ids’, lambda x: [item[0] for item in x]) .output(‘image_ids’) ) def search(text): df = pd.read_csv(‘reverse_image_search.csv’) id_img = df.set_index(‘id’)[‘path’].to_dict() imgs = [] image_ids = search_pipeline(text).to_list()[0][0] return [id_img[image_id] for image_id in image_ids] 2023-06-12 16:22:30,654 – 140406106779840 – connectionpool.py-connectionpool:273 – DEBUG: Resetting dropped connection: huggingface.co 2023-06-12 16:22:32,024 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/config.json HTTP/1.1” 200 0 2023-06-12 16:22:34,339 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/vocab.json HTTP/1.1” 200 0 2023-06-12 16:22:34,717 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/preprocessor_config.json HTTP/1.1” 200 0 2023-06-12 16:22:35,002 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://huggingface.co:443 “HEAD /openai/clip-vit-base-patch16/resolve/main/tokenizer_config.json HTTP/1.1” 200 0

In [21]:import gradio interface = gradio.Interface(search, gradio.inputs.Textbox(lines=1), [gradio.outputs.Image(type=“filepath”, label=None) for _ in range(5)] ) interface.launch(inline=True, share=True) 2023-06-12 16:22:37,627 – 140392909498112 – connectionpool.py-connectionpool:1003 – DEBUG: Starting new HTTPS connection (1): api.gradio.app:443 2023-06-12 16:22:37,629 – 140406106779840 – connectionpool.py-connectionpool:1003 – DEBUG: Starting new HTTPS connection (1): api.gradio.app:443 2023-06-12 16:22:39,054 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://api.gradio.app:443 “GET /pkg-version HTTP/1.1” 200 21 2023-06-12 16:22:39,128 – 140402095003392 – selector_events.py-selector_events:54 – DEBUG: Using selector: EpollSelector 2023-06-12 16:22:39,134 – 140406106779840 – connectionpool.py-connectionpool:228 – DEBUG: Starting new HTTP connection (1): 127.0.0.1:7863 2023-06-12 16:22:39,138 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: http://127.0.0.1:7863 “GET /startup-events HTTP/1.1” 200 5 2023-06-12 16:22:39,143 – 140406106779840 – connectionpool.py-connectionpool:228 – DEBUG: Starting new HTTP connection (1): 127.0.0.1:7863 2023-06-12 16:22:39,153 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: http://127.0.0.1:7863 “HEAD / HTTP/1.1” 200 0 2023-06-12 16:22:39,157 – 140406106779840 – connectionpool.py-connectionpool:1003 – DEBUG: Starting new HTTPS connection (1): api.gradio.app:443 2023-06-12 16:22:39,174 – 140392909498112 – connectionpool.py-connectionpool:456 – DEBUG: https://api.gradio.app:443 “POST /gradio-initiated-analytics/ HTTP/1.1” 200 None Running on local URL: http://127.0.0.1:7863 2023-06-12 16:22:42,604 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://api.gradio.app:443 “GET /v2/tunnel-request HTTP/1.1” 200 None 2023-06-12 16:22:44,331 – 140406106779840 – connectionpool.py-connectionpool:1003 – DEBUG: Starting new HTTPS connection (1): 0353404c1d46f8b38d.gradio.live:443 Running on public URL: https://0353404c1d46f8b38d.gradio.live This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces) 2023-06-12 16:22:46,117 – 140406106779840 – connectionpool.py-connectionpool:456 – DEBUG: https://0353404c1d46f8b38d.gradio.live:443 “HEAD / HTTP/1.1” 200 0 2023-06-12 16:22:46,133 – 140403532490496 – connectionpool.py-connectionpool:1003 – DEBUG: Starting new HTTPS connection (1): api.gradio.app:443

Out[21]:2023-06-12 16:22:47,753 – 140403532490496 – connectionpool.py-connectionpool:456 – DEBUG: https://api.gradio.app:443 “POST /gradio-launched-telemetry/ HTTP/1.1” 200 None

正文完

可以使用微信扫码关注公众号（ID：xzluomor）