amalad commited on
Commit
cd89c5c
·
1 Parent(s): 0f76577

Update README

Browse files
Files changed (2) hide show
  1. README.md +2 -2
  2. explainability.md +2 -2
README.md CHANGED
@@ -41,7 +41,7 @@ Global
41
 
42
  Customers: AI foundry enterprise customers
43
 
44
- Use Cases: Image summarization. Text-image analysis, Optical Character Recognition, Interactive Q&A on images, Comparison and contrast of multiple images, Text Chain-of-Thought reasoning.
45
 
46
 
47
  ## Release Date:
@@ -62,7 +62,7 @@ Language Encoder: Llama-3.1-8B-Instruct
62
  ### Input
63
 
64
  Input Type(s): Image, Text
65
- - Input Images Supported: Multiple images within 16K input + output tokens
66
  - Language Supported: English only
67
 
68
  Input Format(s): Image (Red, Green, Blue (RGB)), and Text (String)
 
41
 
42
  Customers: AI foundry enterprise customers
43
 
44
+ Use Cases: Image summarization. Text-image analysis, Optical Character Recognition, Interactive Q&A on images, Text Chain-of-Thought reasoning
45
 
46
 
47
  ## Release Date:
 
62
  ### Input
63
 
64
  Input Type(s): Image, Text
65
+ - Input Images
66
  - Language Supported: English only
67
 
68
  Input Format(s): Image (Red, Green, Blue (RGB)), and Text (String)
explainability.md CHANGED
@@ -4,9 +4,9 @@ Intended Application & Domain:
4
  Model Type: | Transformer
5
  Intended Users: | Generative AI creators working with conversational AI models and image content.
6
  Output: | Text (Responds to posed question, stateful - remembers previous answers)
7
- Describe how the model works: | Chat based on image/video content
8
  Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable
9
- Technical Limitations: | Max Number of images supported: 4.<br><br>**Context Length:** Supports up to 16,000 tokens total (input + output). If exceeded, input is truncated from the start, and generation ends with an EOS token. Longer prompts may risk performance loss.<br><br>If the model fails (e.g., generates incorrect responses, repeats, or gives poor responses), issues are diagnosed via benchmarks, human review, and internal debugging tools. Only use NVIDIA provided models that use safetensors format. <br><br>Do not expose the vLLM host to a network where any untrusted connections may reach the host. Only use NVIDIA provided models that use safetensors format.
10
  Verified to have met prescribed NVIDIA quality standards: | Yes
11
  Performance Metrics: | MMMU Val with chatGPT as a judge, AI2D, ChartQA Test, InfoVQA Val, OCRBench, OCRBenchV2 English, OCRBenchV2 Chinese, DocVQA val, VideoMME (16 frames), SlideQA (F1)
12
  Potential Known Risks: | The Model may produce output that is biased, toxic, or incorrect responses. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The Model may also generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text, producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.<br>While we have taken safety and security into account and are continuously improving, outputs may still contain political content, misleading information, or unwanted bias beyond our control.
 
4
  Model Type: | Transformer
5
  Intended Users: | Generative AI creators working with conversational AI models and image content.
6
  Output: | Text (Responds to posed question, stateful - remembers previous answers)
7
+ Describe how the model works: | Chat based on image/text
8
  Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable
9
+ Technical Limitations: | <br>**Context Length:** Supports up to 16,000 tokens total (input + output). If exceeded, input is truncated from the start, and generation ends with an EOS token. Longer prompts may risk performance loss.<br><br>If the model fails (e.g., generates incorrect responses, repeats, or gives poor responses), issues are diagnosed via benchmarks, human review, and internal debugging tools. Only use NVIDIA provided models that use safetensors format. <br><br>Do not expose the vLLM host to a network where any untrusted connections may reach the host. Only use NVIDIA provided models that use safetensors format.
10
  Verified to have met prescribed NVIDIA quality standards: | Yes
11
  Performance Metrics: | MMMU Val with chatGPT as a judge, AI2D, ChartQA Test, InfoVQA Val, OCRBench, OCRBenchV2 English, OCRBenchV2 Chinese, DocVQA val, VideoMME (16 frames), SlideQA (F1)
12
  Potential Known Risks: | The Model may produce output that is biased, toxic, or incorrect responses. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The Model may also generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text, producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.<br>While we have taken safety and security into account and are continuously improving, outputs may still contain political content, misleading information, or unwanted bias beyond our control.