You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This Python script, low_level_api_llama_cpp.py, demonstrates the implementation of a low-level API for interacting with the llama_cpp library. The script defines an inference that generates embeddings based on a given prompt using .gguf model.
5
+
6
+
### Prerequisites
7
+
Before running the script, ensure that you have the following dependencies installed:
8
+
9
+
. Python 3.6 or higher
10
+
. llama_cpp: A C++ library for working with .gguf model
11
+
. NumPy: A fundamental package for scientific computing with Python
12
+
. multiprocessing: A Python module for parallel computing
13
+
14
+
### Usage
15
+
install depedencies:
16
+
```bash
17
+
python -m pip install llama-cpp-python ctypes os multiprocessing
18
+
```
19
+
Run the script:
20
+
```bash
21
+
python low_level_api_llama_cpp.py
22
+
```
23
+
24
+
## Code Structure
25
+
The script is organized as follows:
26
+
27
+
### . Initialization:
28
+
Load the model from the specified path.
29
+
Create a context for model evaluation.
30
+
31
+
### . Tokenization:
32
+
Tokenize the input prompt using the llama_tokenize function.
33
+
Prepare the input tokens for model evaluation.
34
+
35
+
### . Inference:
36
+
Perform model evaluation to generate responses.
37
+
Sample from the model's output using various strategies (top-k, top-p, temperature).
38
+
39
+
### . Output:
40
+
Print the generated tokens and the corresponding decoded text.
41
+
42
+
### .Cleanup:
43
+
Free resources and print timing information.
44
+
45
+
## Configuration
46
+
Customize the inference behavior by adjusting the following variables:
47
+
48
+
#### . N_THREADS: Number of CPU threads to use for model evaluation.
49
+
#### . MODEL_PATH: Path to the model file.
50
+
#### . prompt: Input prompt for the chatbot.
51
+
52
+
## Notes
53
+
. Ensure that the llama_cpp library is built and available in the system. Follow the instructions in the llama_cpp repository for building and installing the library.
54
+
55
+
. This script is designed to work with the .gguf model and may require modifications for compatibility with other models.
56
+
57
+
## Acknowledgments
58
+
This code is based on the llama_cpp library developed by the community. Special thanks to the contributors for their efforts.
59
+
60
+
## License
61
+
This project is licensed under the MIT License - see the LICENSE file for details.
0 commit comments