Spaces:

meraGPT
/

meraKB

Running

App Files Files Community

Asankhaya Sharma commited on Oct 3, 2023

Commit

5cbd50f

1 Parent(s): 4e00df7

i

Browse files

Files changed (2) hide show

README.md +174 -13
app.py +135 -0

README.md CHANGED Viewed

@@ -1,13 +1,174 @@
----
-title: MeraKB
-emoji: 📚
-colorFrom: purple
-colorTo: red
-sdk: streamlit
-sdk_version: 1.27.1
-app_file: app.py
-pinned: false
-license: apache-2.0
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Quivr
+<p align="center">
+<img src="../logo.png" alt="Quivr-logo" width="30%">
+<p align="center">
+<a href="https://discord.gg/HUpRgp2HG8">
+  <img src="https://img.shields.io/badge/discord-join%20chat-blue.svg" alt="Join our Discord" height="40">
+</a>
+Quivr is your second brain in the cloud, designed to easily store and retrieve unstructured information. It's like Obsidian but powered by generative AI.
+## Features
+- **Store Anything**: Quivr can handle almost any type of data you throw at it. Text, images, code snippets, you name it.
+- **Generative AI**: Quivr uses advanced AI to help you generate and retrieve information.
+- **Fast and Efficient**: Designed with speed and efficiency in mind. Quivr makes sure you can access your data as quickly as possible.
+- **Secure**: Your data is stored securely in the cloud and is always under your control.
+- **Compatible Files**:
+  - **Text**
+  - **Markdown**
+  - **PDF**
+  - **Audio**
+  - **Video**
+- **Open Source**: Quivr is open source and free to use.
+## Demo
+### Demo with GPT3.5
+https://github.com/StanGirard/quivr/assets/19614572/80721777-2313-468f-b75e-09379f694653
+### Demo with Claude 100k context
+https://github.com/StanGirard/quivr/assets/5101573/9dba918c-9032-4c8d-9eea-94336d2c8bd4
+## Getting Started
+These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
+### Prerequisites
+Make sure you have the following installed before continuing:
+- Python 3.10 or higher
+- Pip
+- Virtualenv
+You'll also need a [Supabase](https://supabase.com/) account for:
+- A new Supabase project
+- Supabase Project API key
+- Supabase Project URL
+### Installing
+- Clone the repository
+```bash
+git clone [email protected]:StanGirard/Quivr.git && cd Quivr
+```
+- Create a virtual environment
+```bash
+virtualenv venv
+```
+- Activate the virtual environment
+```bash
+source venv/bin/activate
+```
+- Install the dependencies
+```bash
+pip install -r requirements.txt
+```
+- Copy the streamlit secrets.toml example file
+```bash
+cp .streamlit/secrets.toml.example .streamlit/secrets.toml
+```
+- Add your credentials to .streamlit/secrets.toml file
+```toml
+supabase_url = "SUPABASE_URL"
+supabase_service_key = "SUPABASE_SERVICE_KEY"
+openai_api_key = "OPENAI_API_KEY"
+anthropic_api_key = "ANTHROPIC_API_KEY" # Optional
+```
+_Note that the `supabase_service_key` is found in your Supabase dashboard under Project Settings -> API. Use the `anon` `public` key found in the `Project API keys` section._
+- Run the following migration scripts on the Supabase database via the web interface (SQL Editor -> `New query`)
+```sql
+-- Enable the pgvector extension to work with embedding vectors
+       create extension vector;
+       -- Create a table to store your documents
+       create table documents (
+       id bigserial primary key,
+       content text, -- corresponds to Document.pageContent
+       metadata jsonb, -- corresponds to Document.metadata
+       embedding vector(1536) -- 1536 works for OpenAI embeddings, change if needed
+       );
+       CREATE FUNCTION match_documents(query_embedding vector(1536), match_count int)
+           RETURNS TABLE(
+               id bigint,
+               content text,
+               metadata jsonb,
+               -- we return matched vectors to enable maximal marginal relevance searches
+               embedding vector(1536),
+               similarity float)
+           LANGUAGE plpgsql
+           AS $$
+           # variable_conflict use_column
+       BEGIN
+           RETURN query
+           SELECT
+               id,
+               content,
+               metadata,
+               embedding,
+               1 -(documents.embedding <=> query_embedding) AS similarity
+           FROM
+               documents
+           ORDER BY
+               documents.embedding <=> query_embedding
+           LIMIT match_count;
+       END;
+       $$;
+```
+and
+```sql
+create table
+  stats (
+    -- A column called "time" with data type "timestamp"
+    time timestamp,
+    -- A column called "details" with data type "text"
+    chat boolean,
+    embedding boolean,
+    details text,
+    metadata jsonb,
+    -- An "integer" primary key column called "id" that is generated always as identity
+    id integer primary key generated always as identity
+  );
+```
+- Run the app
+```bash
+streamlit run main.py
+```
+## Built With
+* [NextJS](https://nextjs.org/) - The React framework used.
+* [FastAPI](https://fastapi.tiangolo.com/) - The API framework used.
+* [Supabase](https://supabase.io/) - The open source Firebase alternative.
+## Contributing
+Open a pull request and we'll review it as soon as possible.
+## Star History
+[![Star History Chart](https://api.star-history.com/svg?repos=StanGirard/quivr&type=Date)](https://star-history.com/#StanGirard/quivr&Date)

app.py ADDED Viewed

	@@ -0,0 +1,135 @@

+# main.py
+import os
+import tempfile
+import streamlit as st
+from files import file_uploader, url_uploader
+from question import chat_with_doc
+from brain import brain
+from langchain.embeddings import HuggingFaceInferenceAPIEmbeddings
+from langchain.vectorstores import SupabaseVectorStore
+from supabase import Client, create_client
+from explorer import view_document
+from stats import get_usage_today
+supabase_url = st.secrets.supabase_url
+supabase_key = st.secrets.supabase_service_key
+openai_api_key = st.secrets.openai_api_key
+anthropic_api_key = st.secrets.anthropic_api_key
+hf_api_key = st.secrets.hf_api_key
+supabase: Client = create_client(supabase_url, supabase_key)
+self_hosted = st.secrets.self_hosted
+# embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
+embeddings = HuggingFaceInferenceAPIEmbeddings(
+    api_key=hf_api_key,
+    model_name="BAAI/bge-large-en-v1.5"
+)
+vector_store = SupabaseVectorStore(supabase, embeddings, query_name='match_documents', table_name="documents")
+models = ["llama-2"]
+if openai_api_key:
+    models += ["gpt-3.5-turbo", "gpt-4"]
+if anthropic_api_key:
+    models += ["claude-v1", "claude-v1.3",
+               "claude-instant-v1-100k", "claude-instant-v1.1-100k"]
+# Set the theme
+st.set_page_config(
+    page_title="meraKB",
+    layout="wide",
+    initial_sidebar_state="expanded",
+)
+st.title("🧠 meraKB - Your digital brain 🧠")
+st.markdown("Store your knowledge in a vector store and chat with it.")
+if self_hosted == "false":
+    st.markdown('**📢 Note: In the public demo, access to functionality is restricted. You can only use the GPT-3.5-turbo model and upload files up to 1Mb. To use more models and upload larger files, consider self-hosting meraKB.**')
+st.markdown("---\n\n")
+st.session_state["overused"] = False
+if self_hosted == "false":
+    usage = get_usage_today(supabase)
+    if usage > st.secrets.usage_limit:
+        st.markdown(
+            f"<span style='color:red'>You have used {usage} tokens today, which is more than your daily limit of {st.secrets.usage_limit} tokens. Please come back later or consider self-hosting.</span>", unsafe_allow_html=True)
+        st.session_state["overused"] = True
+    else:
+        st.markdown(f"<span style='color:blue'>Usage today: {usage} tokens out of {st.secrets.usage_limit}</span>", unsafe_allow_html=True)
+    st.write("---")
+# Initialize session state variables
+if 'model' not in st.session_state:
+    st.session_state['model'] = "llama-2"
+if 'temperature' not in st.session_state:
+    st.session_state['temperature'] = 0.1
+if 'chunk_size' not in st.session_state:
+    st.session_state['chunk_size'] = 500
+if 'chunk_overlap' not in st.session_state:
+    st.session_state['chunk_overlap'] = 0
+if 'max_tokens' not in st.session_state:
+    st.session_state['max_tokens'] = 500
+# Create a radio button for user to choose between adding knowledge or asking a question
+user_choice = st.radio(
+    "Choose an action", ('Add Knowledge', 'Chat with your Brain', 'Forget', "Explore"))
+st.markdown("---\n\n")
+if user_choice == 'Add Knowledge':
+    # Display chunk size and overlap selection only when adding knowledge
+    st.sidebar.title("Configuration")
+    st.sidebar.markdown(
+        "Choose your chunk size and overlap for adding knowledge.")
+    st.session_state['chunk_size'] = st.sidebar.slider(
+        "Select Chunk Size", 100, 1000, st.session_state['chunk_size'], 50)
+    st.session_state['chunk_overlap'] = st.sidebar.slider(
+        "Select Chunk Overlap", 0, 100, st.session_state['chunk_overlap'], 10)
+    # Create two columns for the file uploader and URL uploader
+    col1, col2 = st.columns(2)
+    with col1:
+        file_uploader(supabase, vector_store)
+    with col2:
+        url_uploader(supabase, vector_store)
+elif user_choice == 'Chat with your Brain':
+    # Display model and temperature selection only when asking questions
+    st.sidebar.title("Configuration")
+    st.sidebar.markdown(
+        "Choose your model and temperature for asking questions.")
+    if self_hosted != "false":
+        st.session_state['model'] = st.sidebar.selectbox(
+        "Select Model", models, index=(models).index(st.session_state['model']))
+    else:
+        st.sidebar.write("**Model**: gpt-3.5-turbo")
+        st.sidebar.write("**Self Host to unlock more models such as claude-v1 and GPT4**")
+        st.session_state['model'] = "gpt-3.5-turbo"
+    st.session_state['temperature'] = st.sidebar.slider(
+        "Select Temperature", 0.1, 1.0, st.session_state['temperature'], 0.1)
+    if st.secrets.self_hosted != "false":
+        st.session_state['max_tokens'] = st.sidebar.slider(
+            "Select Max Tokens", 500, 4000, st.session_state['max_tokens'], 500)
+    else:
+        st.session_state['max_tokens'] = 500
+    chat_with_doc(st.session_state['model'], vector_store, stats_db=supabase)
+elif user_choice == 'Forget':
+    st.sidebar.title("Configuration")
+    brain(supabase)
+elif user_choice == 'Explore':
+    st.sidebar.title("Configuration")
+    view_document(supabase)
+st.markdown("---\n\n")