my running code is below . but it's not on gpu.?how to run the server on GPU?the output length is short, it seems that n_ctx does't work
python3 -m llama_cpp.server --model ggml-model-f16.bin --port 7777 --host 192.168.0.1 --n_gpu_layers 30 --n_threads 4 --n_ctx 2048