Qwen-Image-Edit-2509-Turbo-Lightning

Running on Zero

App Files Files Community

LPX55 commited on Nov 10

Commit

63f5ee9

verified ·

1 Parent(s): 54c487a

Update app_temp.py

Browse files

Files changed (1) hide show

app_temp.py +86 -40

app_temp.py CHANGED Viewed

@@ -18,21 +18,18 @@ import os
 import base64
 import json
 SYSTEM_PROMPT = '''
 # Edit Instruction Rewriter
 You are a professional edit instruction rewriter. Your task is to generate a precise, concise, and visually achievable professional-level edit instruction based on the user-provided instruction and the image to be edited.
 Please strictly follow the rewriting rules below:
 ## 1. General Principles
 - Keep the rewritten prompt **concise and comprehensive**. Avoid overly long sentences and unnecessary descriptive language.
 - If the instruction is contradictory, vague, or unachievable, prioritize reasonable inference and correction, and supplement details when necessary.
 - Keep the main part of the original instruction unchanged, only enhancing its clarity, rationality, and visual feasibility.
 - All added objects or modifications must align with the logic and style of the scene in the input images.
 - If multiple sub-images are to be generated, describe the content of each sub-image individually.
 ## 2. Task-Type Handling Rules
 ### 1. Add, Delete, Replace Tasks
 - If the instruction is clear (already includes task type, target entity, position, quantity, attributes), preserve the original intent and only refine the grammar.
 - If the description is vague, supplement with minimal but sufficient details (category, color, size, orientation, position, etc.). For example:
@@ -40,7 +37,6 @@ Please strictly follow the rewriting rules below:
     > Rewritten: "Add a light-gray cat in the bottom-right corner, sitting and facing the camera"
 - Remove meaningless instructions: e.g., "Add 0 objects" should be ignored or flagged as invalid.
 - For replacement tasks, specify "Replace Y with X" and briefly describe the key visual features of X.
 ### 2. Text Editing Tasks
 - All text content must be enclosed in English double quotes `" "`. Keep the original language of the text, and keep the capitalization.
 - Both adding new text and replacing existing text are text replacement tasks, For example:
@@ -49,14 +45,12 @@ Please strictly follow the rewriting rules below:
     - Replace the visual object to "yy"
 - Specify text position, color, and layout only if user has required.
 - If font is specified, keep the original language of the font.
 ### 3. Human Editing Tasks
 - Make the smallest changes to the given user's prompt.
 - If changes to background, action, expression, camera shot, or ambient lighting are required, please list each modification individually.
-- **Edits to makeup or facial features / expression must be subtle, not exaggerated, and must preserve the subject’s identity consistency.**
     > Original: "Add eyebrows to the face"
-    > Rewritten: "Slightly thicken the person’s eyebrows with little change, look natural."
 ### 4. Style Conversion or Enhancement Tasks
 - If a style is specified, describe it concisely using key visual features. For example:
     > Original: "Disco style"
@@ -67,12 +61,10 @@ Please strictly follow the rewriting rules below:
 - Clearly specify the object to be modified. For example:
     > Original: Modify the subject in Picture 1 to match the style of Picture 2.
     > Rewritten: Change the girl in Picture 1 to the ink-wash style of Picture 2 — rendered in black-and-white watercolor with soft color transitions.
 ### 5. Material Replacement
 - Clearly specify the object and the material. For example: "Change the material of the apple to papercut style."
 - For text material replacement, use the fixed template:
     "Change the material of text "xxxx" to laser style"
 ### 6. Logo/Pattern Editing
 - Material replacement should preserve the original shape and structure as much as possible. For example:
    > Original: "Convert to sapphire material"
@@ -80,55 +72,96 @@ Please strictly follow the rewriting rules below:
 - When migrating logos/patterns to new scenes, ensure shape and structure consistency. For example:
    > Original: "Migrate the logo in the image to a new scene"
    > Rewritten: "Migrate the logo in the image to a new scene, preserving similar shape and structure"
 ### 7. Multi-Image Tasks
-- Rewritten prompts must clearly point out which image’s element is being modified. For example:
     > Original: "Replace the subject of picture 1 with the subject of picture 2"
-    > Rewritten: "Replace the girl of picture 1 with the boy of picture 2, keeping picture 2’s background unchanged"
-- For stylization tasks, describe the reference image’s style in the rewritten prompt, while preserving the visual content of the source image.
 ## 3. Rationale and Logic Check
-- Resolve contradictory instructions: e.g., “Remove all trees but keep all trees” requires logical correction.
 - Supplement missing critical information: e.g., if position is unspecified, choose a reasonable area based on composition (near subject, blank space, center/edge, etc.).
 # Output Format Example
 ```json
 {
    "Rewritten": "..."
 }
 '''
-# --- Prompt Enhancement using Hugging Face InferenceClient ---
-def polish_prompt_hf(prompt, img_list):
     """
     Rewrites the prompt using a Hugging Face InferenceClient.
     """
     # Ensure HF_TOKEN is set
     api_key = os.environ.get("HF_TOKEN")
     if not api_key:
         print("Warning: HF_TOKEN not set. Falling back to original prompt.")
-        return prompt
     try:
         # Initialize the client
-        prompt = f"{SYSTEM_PROMPT}\n\nUser Input: {prompt}\n\nRewritten Prompt:"
-            # Initialize the client
         client = InferenceClient(
-            provider="novita",
             api_key=api_key,
         )
         # Format the messages for the chat completions API
-        sys_promot = "you are a helpful assistant, you should provide useful answers to users."
         messages = [
-            {"role": "system", "content": sys_promot},
-            {"role": "user", "content": []}]
-        for img in img_list:
-            messages[1]["content"].append(
-                {"image": f"data:image/png;base64,{encode_image(img)}"})
-        messages[1]["content"].append({"text": f"{prompt}"})
         completion = client.chat.completions.create(
-            model="Qwen/Qwen3-Next-80B-A3B-Instruct",
             messages=messages,
         )
@@ -136,7 +169,7 @@ def polish_prompt_hf(prompt, img_list):
         result = completion.choices[0].message.content
         # Try to extract JSON if present
-        if '{"Rewritten"' in result:
             try:
                 # Clean up the response
                 result = result.replace('```json', '').replace('```', '')
@@ -153,9 +186,7 @@ def polish_prompt_hf(prompt, img_list):
     except Exception as e:
         print(f"Error during API call to Hugging Face: {e}")
         # Fallback to original prompt if enhancement fails
-        return prompt
 def encode_image(pil_image):
     import io
@@ -208,6 +239,12 @@ optimize_pipeline_(pipe, image=[Image.new("RGB", (1024, 1024)), Image.new("RGB",
 # --- UI Constants and Helpers ---
 MAX_SEED = np.iinfo(np.int32).max
 # --- Main Inference Function (with hardcoded negative prompt) ---
 @spaces.GPU(duration=40)
 def infer(
@@ -220,7 +257,7 @@ def infer(
     height=None,
     width=None,
     rewrite_prompt=True,
-    num_images_per_prompt=1,
     progress=gr.Progress(track_tqdm=True),
 ):
     """
@@ -368,9 +405,17 @@ with gr.Blocks(css=css) as demo:
                     step=8,
                     value=None,
                 )
-                rewrite_prompt = gr.Checkbox(label="Rewrite prompt (being fixed)", value=False)
         # gr.Examples(examples=examples, inputs=[prompt], outputs=[result, seed], fn=infer, cache_examples=False)
@@ -387,6 +432,7 @@ with gr.Blocks(css=css) as demo:
             height,
             width,
             rewrite_prompt,
         ],
         outputs=[result, seed],
     )

 import base64
 import json
 SYSTEM_PROMPT = '''
 # Edit Instruction Rewriter
 You are a professional edit instruction rewriter. Your task is to generate a precise, concise, and visually achievable professional-level edit instruction based on the user-provided instruction and the image to be edited.
 Please strictly follow the rewriting rules below:
 ## 1. General Principles
 - Keep the rewritten prompt **concise and comprehensive**. Avoid overly long sentences and unnecessary descriptive language.
 - If the instruction is contradictory, vague, or unachievable, prioritize reasonable inference and correction, and supplement details when necessary.
 - Keep the main part of the original instruction unchanged, only enhancing its clarity, rationality, and visual feasibility.
 - All added objects or modifications must align with the logic and style of the scene in the input images.
 - If multiple sub-images are to be generated, describe the content of each sub-image individually.
 ## 2. Task-Type Handling Rules
 ### 1. Add, Delete, Replace Tasks
 - If the instruction is clear (already includes task type, target entity, position, quantity, attributes), preserve the original intent and only refine the grammar.
 - If the description is vague, supplement with minimal but sufficient details (category, color, size, orientation, position, etc.). For example:
     > Rewritten: "Add a light-gray cat in the bottom-right corner, sitting and facing the camera"
 - Remove meaningless instructions: e.g., "Add 0 objects" should be ignored or flagged as invalid.
 - For replacement tasks, specify "Replace Y with X" and briefly describe the key visual features of X.
 ### 2. Text Editing Tasks
 - All text content must be enclosed in English double quotes `" "`. Keep the original language of the text, and keep the capitalization.
 - Both adding new text and replacing existing text are text replacement tasks, For example:
     - Replace the visual object to "yy"
 - Specify text position, color, and layout only if user has required.
 - If font is specified, keep the original language of the font.
 ### 3. Human Editing Tasks
 - Make the smallest changes to the given user's prompt.
 - If changes to background, action, expression, camera shot, or ambient lighting are required, please list each modification individually.
+- **Edits to makeup or facial features / expression must be subtle, not exaggerated, and must preserve the subject's identity consistency.**
     > Original: "Add eyebrows to the face"
+    > Rewritten: "Slightly thicken the person's eyebrows with little change, look natural."
 ### 4. Style Conversion or Enhancement Tasks
 - If a style is specified, describe it concisely using key visual features. For example:
     > Original: "Disco style"
 - Clearly specify the object to be modified. For example:
     > Original: Modify the subject in Picture 1 to match the style of Picture 2.
     > Rewritten: Change the girl in Picture 1 to the ink-wash style of Picture 2 — rendered in black-and-white watercolor with soft color transitions.
 ### 5. Material Replacement
 - Clearly specify the object and the material. For example: "Change the material of the apple to papercut style."
 - For text material replacement, use the fixed template:
     "Change the material of text "xxxx" to laser style"
 ### 6. Logo/Pattern Editing
 - Material replacement should preserve the original shape and structure as much as possible. For example:
    > Original: "Convert to sapphire material"
 - When migrating logos/patterns to new scenes, ensure shape and structure consistency. For example:
    > Original: "Migrate the logo in the image to a new scene"
    > Rewritten: "Migrate the logo in the image to a new scene, preserving similar shape and structure"
 ### 7. Multi-Image Tasks
+- Rewritten prompts must clearly point out which image's element is being modified. For example:
     > Original: "Replace the subject of picture 1 with the subject of picture 2"
+    > Rewritten: "Replace the girl of picture 1 with the boy of picture 2, keeping picture 2's background unchanged"
+- For stylization tasks, describe the reference image's style in the rewritten prompt, while preserving the visual content of the source image.
 ## 3. Rationale and Logic Check
+- Resolve contradictory instructions: e.g., "Remove all trees but keep all trees" requires logical correction.
 - Supplement missing critical information: e.g., if position is unspecified, choose a reasonable area based on composition (near subject, blank space, center/edge, etc.).
 # Output Format Example
 ```json
 {
    "Rewritten": "..."
 }
 '''
+def polish_prompt_hf(original_prompt, img_list):
     """
     Rewrites the prompt using a Hugging Face InferenceClient.
+    Supports multiple images via img_list.
     """
     # Ensure HF_TOKEN is set
     api_key = os.environ.get("HF_TOKEN")
     if not api_key:
         print("Warning: HF_TOKEN not set. Falling back to original prompt.")
+        return original_prompt
+    prompt = f"{SYSTEM_PROMPT}\n\nUser Input: {original_prompt}\n\nRewritten Prompt:"
+    system_prompt = "you are a helpful assistant, you should provide useful answers to users."
     try:
         # Initialize the client
         client = InferenceClient(
+            provider="nebius",
             api_key=api_key,
         )
+        # Convert list of images to base64 data URLs
+        image_urls = []
+        if img_list is not None:
+            # Ensure img_list is actually a list
+            if not isinstance(img_list, list):
+                img_list = [img_list]
+            for img in img_list:
+                image_url = None
+                # If img is a PIL Image
+                if hasattr(img, 'save'):  # Check if it's a PIL Image
+                    buffered = BytesIO()
+                    img.save(buffered, format="PNG")
+                    img_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8')
+                    image_url = f"data:image/png;base64,{img_base64}"
+                # If img is already a file path (string)
+                elif isinstance(img, str):
+                    with open(img, "rb") as image_file:
+                        img_base64 = base64.b64encode(image_file.read()).decode('utf-8')
+                    image_url = f"data:image/png;base64,{img_base64}"
+                else:
+                    print(f"Warning: Unexpected image type: {type(img)}, skipping...")
+                    continue
+                if image_url:
+                    image_urls.append(image_url)
+        # Build the content array with text first, then all images
+        content = [
+            {
+                "type": "text",
+                "text": prompt
+            }
+        ]
+        # Add all images to the content
+        for image_url in image_urls:
+            content.append({
+                "type": "image_url",
+                "image_url": {
+                    "url": image_url
+                }
+            })
         # Format the messages for the chat completions API
         messages = [
+            {"role": "system", "content": system_prompt},
+            {
+                "role": "user",
+                "content": content
+            }
+        ]
+        # Call the API
         completion = client.chat.completions.create(
+            model="Qwen/Qwen2.5-VL-72B-Instruct",
             messages=messages,
         )
         result = completion.choices[0].message.content
         # Try to extract JSON if present
+        if '"Rewritten"' in result:
             try:
                 # Clean up the response
                 result = result.replace('```json', '').replace('```', '')
     except Exception as e:
         print(f"Error during API call to Hugging Face: {e}")
         # Fallback to original prompt if enhancement fails
+        return original_prompt
 def encode_image(pil_image):
     import io
 # --- UI Constants and Helpers ---
 MAX_SEED = np.iinfo(np.int32).max
+def use_output_as_input(output_images):
+    """Convert output images to input format for the gallery"""
+    if output_images is None or len(output_images) == 0:
+        return []
+    return output_images
 # --- Main Inference Function (with hardcoded negative prompt) ---
 @spaces.GPU(duration=40)
 def infer(
     height=None,
     width=None,
     rewrite_prompt=True,
+    num_images_per_prompt=2,
     progress=gr.Progress(track_tqdm=True),
 ):
     """
                     step=8,
                     value=None,
                 )
+                num_images = gr.Slider(
+                    label="Num Images per Prompt",
+                    minimum=1,
+                    maximum=4,
+                    step=1,
+                    value=2,
+                )
+            rewrite_prompt = gr.Checkbox(label="Rewrite prompt (being fixed)", value=False)
         # gr.Examples(examples=examples, inputs=[prompt], outputs=[result, seed], fn=infer, cache_examples=False)
             height,
             width,
             rewrite_prompt,
+            num_images
         ],
         outputs=[result, seed],
     )