Embedding documents using optimized and quantized embedders integration
Integrate with the Embedding documents using optimized and quantized embedders embedding model using LangChain Python.
Embedding all documents using Quantized Embedders.The embedders are based on optimized models, created by using optimum-intel and IPEX.Example text is based on SBERT.
Copy
from langchain_community.embeddings import QuantizedBiEncoderEmbeddingsmodel_name = "Intel/bge-small-en-v1.5-rag-int8-static"encode_kwargs = {"normalize_embeddings": True} # set True to compute cosine similaritymodel = QuantizedBiEncoderEmbeddings( model_name=model_name, encode_kwargs=encode_kwargs, query_instruction="Represent this sentence for searching relevant passages: ",)
Copy
loading configuration file inc_config.json from cache atINCConfig { "distillation": {}, "neural_compressor_version": "2.4.1", "optimum_version": "1.16.2", "pruning": {}, "quantization": { "dataset_num_samples": 50, "is_static": true }, "save_onnx_model": false, "torch_version": "2.2.0", "transformers_version": "4.37.2"}Using `INCModel` to load a TorchScript model will be deprecated in v1.15.0, to load your model please use `IPEXModel` instead.
Let’s ask a question, and compare to 2 documents. The first contains the answer to the question, and the second one does not.We can check better suits our query.
Copy
question = "How many people live in Berlin?"
Copy
documents = [ "Berlin had a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.", "Berlin is well known for its museums.",]