A while ago, I had requested help with using LLMs to manage all my teaching notes. I have since installed Ollama and been playing with it to get a feel for the setup.
I was also suggested the use of RAG (Retrieval Augmented Generation ) and CA (cognitive architecture). However, I am unclear on good self hosted options for these two tasks. Could you please suggest a few?
For example, I tried ragflow.io and installed it on my system, but it seems I need to setup an account with a username and password to use it. It remains unclear if I can use the system offline like the base ollama model, and that information won’t be sent from my computer system.
- You can use h2ogpt which allows you to build a RAG choosing your documents without coding anything 
- You should ask @[email protected]. He seems to know all about this stuff. - I have an old Lenovo laptop with an NVIDIA graphics card. - @[email protected] The biggest question I have for you is what graphics card, but generally speaking this is… less than ideal. - To answer your question, Open Web UI is the new hotness: https://github.com/open-webui/open-webui - I personally use exui for a lot of my LLM work, but that’s because I’m an uber minimalist. - And on your setup, I would host the best model you can on kobold.cpp or the built-in llama.cpp server (just not Ollama) and use Open Web UI as your front end. You can also use llama.cpp to host an embeddings model for RAG, if you wish. - This is a general ranking of the “best” models for document answering and summarization: https://huggingface.co/spaces/vectara/Hallucination-evaluation-leaderboard - …But generally, I prefer to not mess with RAG retrieval and just slap the context I want into the LLM myself, and for this, the performance of your machine is kind of critical (depending on just how much “context” you want it to cover). I know this is !selfhosted, but once you get your setup dialed in, you may consider making calls to an API like Groq, Cerebras or whatever, or even renting a Runpod GPU instance if that’s in your time/money budget. 
 
- Not sure how ollama integration works in general, but these are two good libraries for RAG: 
- Why don’t you build your own? 

