Disco Diffusion (DD) is a Google Colab Notebook which leverages an AI Image generating technique called CLIP-Guided Diffusion to allow you to create compelling and beautiful images from just text inputs. Created by Somnai, augmented by Gandamu, and building on the work of RiversHaveWings, nshepperd, and many others.
It’s magic. And also, free. (!)
Disco Diffusion 是一個透過Diffusion model 以及CLIP 技術來實現文轉圖AI 作畫的程式．其用於訓練model的dataset畫作包括世界主名畫作及線上創作網站artstation.com上的數萬藝術作品．運行時使用Pytorch以及CUDA加速．
Ruka 個人簡易版本(Using jina-ai/discoart): https://colab.research.google.com/drive/1adglV_W87i69EsaQhwNnJc3O00i2tfG1
按下第一個cell確認colab規格, 確保有足夠的gpu vram(ex: Colab免費版會提供的V100的14GB), 如果沒有則需要Change runtime type並且選擇使用GPU
接下來案第二個cell import package, 這部分會使虛擬機重啟, 重啟後再重新run 第二個cell
n_batches: (50|1–100) This variable sets the number of still images you want DD to create. If you are using an animation mode (see below for details) DD will ignore n_batches and create a single set of animated frames based on the animation settings.
text_prompts: (defaultdict(String list)) Text prompt for AI to render art, the key of dict represent the step where this prompt will will be consider by AI.
steps: (250|50–10000) When creating an image, the denoising curve is subdivided into steps for processing. Each step (or iteration) involves the AI looking at subsets of the image called ‘cuts’ and calculating the ‘direction’ the image should be guided to be more like the prompt. Then it adjusts the image with the help of the diffusion denoiser, and moves to the next step.
skip_steps: (10|integer up to steps) Consider the chart shown here. Noise scheduling (denoise strength) starts very high and progressively gets lower and lower as diffusion steps progress. The noise levels in the first few steps are very high, so images change dramatically in early steps.
init_image: Optional. Recall that in the image sequence above, the first image shown is just noise. If an init_image is provided, diffusion will replace the noise with the init_image as its starting state. To use an init_image, upload the image to the Colab instance or your Google Drive, and enter the full image path here.
clip_guidance_scale: (5000|1500–100000) CGS is one of the most important parameters you will use. It tells DD how strongly you want CLIP to move toward your prompt each timestep. Higher is generally better, but if CGS is too strong it will overshoot the goal and distort the image. So a happy medium is needed, and it takes experience to learn how to adjust CGS.
cutn_batches: (4|1–8) Each iteration, the AI cuts the image into smaller pieces known as cuts, and compares each cut to the prompt to decide how to guide the next diffusion step. More cuts can generally lead to better images, since DD has more chances to fine-tune the image precision in each timestep.
display_rate: (50|5–500) During a diffusion run, you can monitor the progress of each image being created with this variable. If display_rate is set to 50, DD will show you the in-progress image every 50 timesteps.
1. CLIP Model (January 5, 2021)
CLIP pre-trains an image encoder and a text encoder to predict which images were paired with which texts in our dataset. We then use this behavior to turn CLIP into a zero-shot classifier. We convert all of a dataset’s classes into captions such as “a photo of a dog” and predict the class of the caption CLIP estimates best pairs with a given image.
2. Diffusion Model (May 13, 2021)
Diffusion Models are generative models, meaning that they are used to generate data similar to the data on which they are trained. Fundamentally, Diffusion Models work by destroying training data through the successive addition of Gaussian noise, and then learning to recover the data by reversing this noising process. After training, we can use the Diffusion Model to generate data by simply passing randomly sampled noise through the learned denoising process.
3. Disco Diffusion 作畫步驟:
- 先用Diffusion model生成亂數image, 或使用init_image
- 讓Diffusion model針對該部分使用CLIP所詮釋的內容來重新作畫
- 反覆重複以上2–4步驟(steps)次, 直到結束
- 他是一個dictionary, 你可以決定在第幾步的時候增加不同的作圖方向
- 在prompt list裡面, 可以給不同的敘述不同的權重, 以加強作圖方向