Google image searchy

4/8/2023

What is even more important here is the caching capability. Here, we are just calling the utilities we had previously developed to return the URLs of the most similar images and their scores. """ images_redis_key = keyword + "_images" urls_redis_key = keyword + "_urls" if redis_client.exists(images_redis_key) and redis_client.exists( urls_redis_key ): keyword_images = redis_client.get(images_redis_key) keyword_image_urls = redis_client.get(urls_redis_key) else: (keyword_images, keyword_image_urls) = fetch_images_tag( keyword, pixabay_max ) redis_t(images_redis_key, keyword_images) redis_t(urls_redis_key, keyword_image_urls) (top_indices, top_scores) = self.similarity_model.perform_sim_search( keyword_images, semantic_query, top_k ) top_urls = for index in top_indices] return (top_urls, top_scores) :return: Tuple of top_k URLs and the similarity scores of the images present inside the URLs. :param pixabay_max: Number of maximum images to retrieve from Pixabay. :param semantic_query: Query to find semantically similar images retrieved from Pixabay.

:param keyword: Keyword to search with on Pixabay. Following is the main class of that script:Ĭlass Searcher: def _init_(self): self.similarity_model = SimilarityUtil() def get_similar_images(self, keyword, semantic_query, pixabay_max, top_k): """ Finds semantically similar images. Now, we need to collate our utilities for fetching images from Pixabay and for performing the natural language image search inside a single script - perform_search.py. If you want to interact with this utility in a live manner you can check out this Colab Notebook. This is the meat of our application and we have tackled it already. Note that we are initializing the CLIP model while instantiating the SimilarityUtil to save us the model loading time. We then sort the vector in a descending manner. This gives us a vector (logits_per_image) that contains the similarity scores between each of the images and the query. In the code above, we are first invoking the CLIP model with images and the natural language query. To know more about the particular methods we are using for the CLIP model please refer to this documentation from Hugging Face. These two embeddings are matched with one another during inference. The text-based query is also encoded using A Transformers-based model for generating the embeddings. """ values, indices = scores.squeeze().topk(top_k) top_indices, top_scores =, for score, index in zip(values, indices): top_indices.append(int(index.numpy())) score = score.numpy().tolist() top_scores.append(round(score, 3)) return (top_indices, top_scores)ĬLIP_MODEL uses a ViT-base model to encode the images for generating meaningful embeddings with respect to the provided query. :param top_k: Number of top scores to return. """ model = (vice) # Obtain the text-image similarity scores with torch.no_grad(): inputs = self.processor( text=, images=images, return_tensors="pt", padding=True ) inputs = inputs.to(vice) outputs = model(**inputs) # Image-text similarity scores logits_per_image = outputs.logits_per_image.cpu() (top_indices, top_scores) = self.sort_scores(logits_per_image, top_k) return (top_indices, top_scores) def sort_scores(self, scores, top_k): """ Sorts the scores in a descending manner. :return: Top-k indices matching the query semantically and their similarity scores. :param top_k: Number of top images to return from `images`. :param query_phrase: A list containing a single text query, e.g. :param images: A list of PIL images initially retrieved with respect to some entity e.g. For this post, we won’t be using this approach.įigure 2 below depicts the architecture design of our application and the technical stack used for each of the components.Ĭlass SimilarityUtil: def _init_(self): self.model = om_pretrained(CLIP_MODEL) self.processor = om_pretrained(CLIP_PREPROCESSOR) vice = "cuda" if _available() else "cpu" def perform_sim_search(self, images, query_phrase, top_k=3): """ Performs similarity search between the images and query. Note: Instead of two queries, we could have only taken a single long query and run named-entity extraction to determine the most likely important keywords to run the initial search with. These images should be semantically similar to this query.

Longer or semantic query that we will use to retrieve the images from the pool created in the step above.
For this, we’d first pull in a few “horse” images and then run another utility to find out the images that best match our query.

Suppose we wanted to find images that are similar to this query: “horses amidst flowers”. We will cache these images to optimize the user experience. But we found Pixabay’s API to be easier to work with. You can use any other image repositories for this purpose. This is needed in order to pull a set of images of interest from Pixabay. Our application will take two queries from the user: All the code shown in this post is available as a GitHub repository.

0 Comments

Google image searchy

Leave a Reply.

Author

Archives

Categories