为什么要自动更新播客?
我们做内容的都知道,最大的问题不仅仅是创作,而且发布太繁琐也是个大问题。你要一条条上传音频、写描述、同步封面,光是把一集播客发到5个平台,就能让人劝退。
但如果有个方法,可以让我只上传一次,就能全平台同步更新,是不是省时省力又体面?于是就有了这套玩法:AI生成播客 + 多平台同步。
第一步:用AI批量生成播客内容
先说最核心的一步——播客内容从哪来?
我的方法是:找一些结构化或主题清晰的素材(比如电子书章节、技术文章合集、行业报告等),然后分批导入 NotebookLM 这个工具。
这个工具的牛点在于,它能理解资料内容,然后你可以通过提示词让它生成一段质量不错的,类似播客主持人口吻的音频。通过对提示词的调整,可以从不同角度来优化和调整输出的播客音频。
第二步:在主平台创建播客并上传音频
内容搞定之后,得找个“主阵地”上传音频文件。我试了两家:
Spotify for Podcasters(海外)
喜马拉雅(国内)
这两家都有一个共同特点:能生成 RSS Feed,用户量大。
这个 RSS Feed 非常关键,它就像播客的“分发中枢”,有了它,其他平台就可以自动订阅并更新你的播客内容。
这里重点讲一下喜马拉雅的RSS Feed,隐藏的非常深,不容易找到,登录喜马拉雅创作中心后台,点击“创作实验室”,选择“Apple 播客托管服务”,即可生成一个RSS Feed,这个RSS Feed不仅仅可以同步到Apple Podcasts,同步到其他播客平台也没问题。
Spotify的RSS Feed就好找多了,登录spotify for creators的后台,点击“设置”-“Availability”,即可看到RSS Distribution里面的链接。
你上传好音频,设置好标题、描述、封面,播客就上线了,同时也自动生成了对应的 RSS 链接。
第三步:把RSS同步到其他平台
拿着刚才生成的 RSS 链接,我们可以去以下平台注册并导入:iTunes Podcasts(Apple)、YouTube Music、网易云音乐、小宇宙、Pocket Casts等等。
这些平台在创建播客时一般都有“使用RSS导入”选项,只要粘贴你的链接,它们就能自动抓取更新。
这样一来,你只需要维护喜马拉雅或者Spotify的那一个源头,其余平台会自动同步更新,不用你操心。
后续更新流程就是“一次上传,全网同步”。
从第二期音频开始,你就爽了。流程如下:
把新内容丢进 NotebookLM,设计一个提示词,生成音频。
上传音频到你的主平台(比如喜马拉雅)。
所有绑定RSS的平台都会自动更新。
这不就是我们程序员最爱的“自动化工作流”吗?
一些实用小贴士
素材限制: NotebookLM 对处理的文本长度(包括中文)有以下限制:1、按来源文件限制:每个上传到 NotebookLM 的来源文件(例如 PDF、Google 文档、文本文件等)的字数上限为 50 万字。同时,上传的本地文件大小上限为 200MB。2、按笔记本限制:一个笔记本中可以包含的来源数量,普通用户上限为 50个。
节奏控制: AI生成内容最好控制在10~20分钟,既不累也容易被听完。
配图和封面: ChatGPT出图非常快,图片质量高,顺手还能给社媒配套宣传图。
结语:别等完美,先上车
很多人总觉得做播客门槛高,其实现在有了AI工具和自动化同步工具,真的不难。重要的不是一开始多完美,而是先跑起来,优化可以慢慢来。
如果你也想做知识型播客,这套方法值得一试:轻量、自动化、省心、可扩展。
我的播客地址
下面是我自己创建的各个平台的播客地址:
YouTube:https://www.youtube.com/@williamlong
Apple Podcasts:https://podcasts.apple.com/podcast/%E6%9C%88%E5%85%89%E6%92%AD%E5%AE%A2/id1816103541
Spotify:https://open.spotify.com/show/5L8RZKHcSwfLFzC1qFNNs6
喜马拉雅:https://www.ximalaya.com/album/92461056
网易云音乐:https://music.163.com/#/djradio?id=1224404483
小宇宙:https://www.xiaoyuzhoufm.com/podcast/68302549457b22ce0d25dc08
<div id="google_translate_element"></div>
<script>
function googleTranslateElementInit() {
new google.translate.TranslateElement({
pageLanguage: "zh-TW",
autoDisplay: false,
layout: google.translate.TranslateElement.InlineLayout.VERTICAL
}, "google_translate_element");
}
</script>
<script src="//translate.google.com/translate_a/element.js?cb=googleTranslateElementInit"></script>
<script src='//ajax.googleapis.com/ajax/libs/jquery/2.0.0/jquery.min.js'></script>
可參考「引用 jQuery 的注意事項」,檢查範本是否已安裝過 jQuery,如果已經安裝過請刪除此行,以免重複安裝。
2. 安裝程式碼
國旗按鈕想要放在網站什麼地方,可參考「Blogger 範本各區塊程式碼」,找到自己想擺放的位置。
在後台「主題」→ "自訂" 按鈕右方的下拉圖示 →「編輯 HTML」,游標點進範本區塊,找到你想放的位置後,貼上以下程式碼:
<!--國旗翻譯工具-->
<div id="flag_translate">
<img class="en" src="https://3.bp.blogspot.com/-lPe1MfSK7zs/UXaYYpJ-lJI/AAAAAAAAGiU/FIBrY3aIhW0/s1600/eng.jpg" alt="英文">
<img class="zh-CN" src="https://4.bp.blogspot.com/-HqR8f67uo9g/UXaVG1JlVEI/AAAAAAAAGhs/9Ak-hJ-UGRA/s1600/cn.jpg" alt="簡中">
<img class="ja" src="https://1.bp.blogspot.com/-kCzI3AnvG1c/UXaYZZ7IBDI/AAAAAAAAGic/bt6V0kD-Ong/s1600/jp.jpg" alt="日文">
</div>
<style>
#flag_translate img{margin-right: 10px; cursor: pointer;}
</style>
<script>
//<![CDATA[
$("#flag_translate img").click(function() {
let className = this.className;
let $select = $("#google_translate_element select");
$select.val(className);
setTimeout(function() {
$select[0].dispatchEvent(new Event("change"));
}, 10);
});
//]]>
</script>
<!--Designed by WFU BLOG-->
儲存後即可看到效果。
3. 修改參數
想要自訂參數、圖案的話,請見以下說明:
localhost:6000
,原因是6000是浏览器默认屏蔽的端口。<dialog>
的 CSS 技巧(英文)<dialog>
,本文介绍两个配套使用的 CSS 技巧。slack.com/about
,但是 Slack 没有采用这种做法。is
,并分拆成若干个子页面。slack.com/is
。
- slack.com/is/team-communication
- slack.com/is/everything-in-one-place
- slack.com/is/wherever-you-are
is
也正好是一个顶级域名,代表冰岛(iceland)。很多名人就申请了 is 域名,作为个人主页。jessicahische.is
,她介绍自己的页面 URL 就都是jessicahische.is/xxx
的形式。apt install python3-pip python3.10-venv
pip install huggingface-hub wheel jupyter
git clone https://github.com/ByteDance-Seed/Bagel
cd Bagel/
vim requirements.txt
Then use VIM to remove the 16th line flash_attn==2.5.8
from the text file. This will prevent a broken install later on. Alternatively, you can directly remove it by accessing the host machine from your local VS Code application, and using the file editor to modify the text file. Once that’s done, we can continue with installation.pip install -r requirements.txt
pip install git+https://github.com/Dao-AILab/flash-attention
This will complete the installation of all the required packages for this demo. Next, we recommend using the Hugging Face CLI to login to your Hugging Face account. We can then download the model files for the demo.huggingface-cli login
##After entering your access token
huggingface-cli download ByteDance-Seed/BAGEL-7B-MoT --local-folder ./BAGEL-7B-MoT
Alternatively, you can download the model files in your notebook later on using the following Python code:from huggingface_hub import snapshot_download
save_dir = "./BAGEL-7B-MoT"
repo_id = "ByteDance-Seed/BAGEL-7B-MoT"
cache_dir = save_dir + "/cache"
snapshot_download(cache_dir=cache_dir,
local_dir=save_dir,
repo_id=repo_id,
local_dir_use_symlinks=False,
resume_download=True,
allow_patterns=["*.json", "*.safetensors", "*.bin", "*.py", "*.md", "*.txt"],
)
Once your model weights are downloaded, we can open our Jupyter Notebook demo. Paste the following code into the terminal to launch Jupyter Lab.jupyter lab --allow-root
jupyter lab --allow-rootinference.ipynb
notebook file, and run the first 7 code cells. These are labeled to correspond to what each does for the setup process.import os
from copy import deepcopy
from typing import (
Any,
AsyncIterable,
Callable,
Dict,
Generator,
List,
NamedTuple,
Optional,
Tuple,
Union,
)
import requests
from io import BytesIO
from PIL import Image
import torch
from accelerate import infer_auto_device_map, load_checkpoint_and_dispatch, init_empty_weights
from data.transforms import ImageTransform
from data.data_utils import pil_img2rgb, add_special_tokens
from modeling.bagel import (
BagelConfig, Bagel, Qwen2Config, Qwen2ForCausalLM, SiglipVisionConfig, SiglipVisionModel
)
from modeling.qwen2 import Qwen2Tokenizer
from modeling.bagel.qwen2_navit import NaiveCache
from modeling.autoencoder import load_ae
from safetensors.torch import load_file
This first cell loads all the required packages for the demo. Next, we initialize the model weights. Be sure to edit the value on line 1 to reflect the path to your model weights, ./BAGEL-7B-MoT
.model_path = "./BAGEL-7B-MoT"
# LLM config preparing
llm_config = Qwen2Config.from_json_file(os.path.join(model_path, "llm_config.json"))
llm_config.qk_norm = True
llm_config.tie_word_embeddings = False
llm_config.layer_module = "Qwen2MoTDecoderLayer"
# ViT config preparing
vit_config = SiglipVisionConfig.from_json_file(os.path.join(model_path, "vit_config.json"))
vit_config.rope = False
vit_config.num_hidden_layers = vit_config.num_hidden_layers - 1
# VAE loading
vae_model, vae_config = load_ae(local_path=os.path.join(model_path, "ae.safetensors"))
# Bagel config preparing
config = BagelConfig(
visual_gen=True,
visual_und=True,
llm_config=llm_config,
vit_config=vit_config,
vae_config=vae_config,
vit_max_num_patch_per_side=70,
connector_act='gelu_pytorch_tanh',
latent_patch_size=2,
max_latent_size=64,
)
with init_empty_weights():
language_model = Qwen2ForCausalLM(llm_config)
vit_model = SiglipVisionModel(vit_config)
model = Bagel(language_model, vit_model, config)
model.vit_model.vision_model.embeddings.convert_conv2d_to_linear(vit_config, meta=True)
# Tokenizer Preparing
tokenizer = Qwen2Tokenizer.from_pretrained(model_path)
tokenizer, new_token_ids, _ = add_special_tokens(tokenizer)
# Image Transform Preparing
vae_transform = ImageTransform(1024, 512, 16)
vit_transform = ImageTransform(980, 224, 14)
Next, we will load the model for single or multi GPU inference. Edit the value on line 1 to reflect the corresponding amount of VRAM on your system, either 80 or 640.max_mem_per_gpu = "80GiB" # Modify it according to your GPU setting
device_map = infer_auto_device_map(
model,
max_memory={i: max_mem_per_gpu for i in range(torch.cuda.device_count())},
no_split_module_classes=["Bagel", "Qwen2MoTDecoderLayer"],
)
print(device_map)
same_device_modules = [
'language_model.model.embed_tokens',
'time_embedder',
'latent_pos_embed',
'vae2llm',
'llm2vae',
'connector',
'vit_pos_embed'
]
if torch.cuda.device_count() == 1:
first_device = device_map.get(same_device_modules[0], "cuda:0")
for k in same_device_modules:
if k in device_map:
device_map[k] = first_device
else:
device_map[k] = "cuda:0"
else:
first_device = device_map.get(same_device_modules[0])
for k in same_device_modules:
if k in device_map:
device_map[k] = first_device
model = load_checkpoint_and_dispatch(
model,
checkpoint=model_path+"/ema.safetensors"),
device_map=device_map,
offload_buffers=True,
dtype=torch.bfloat16,
)
model = model.eval()
print('Model loaded')
In the next two cells, we will load in the inferencer package to load in our inference pipeline. We also set the seed value to randomize.from inferencer import InterleaveInferencer
inferencer = InterleaveInferencer(
model=model,
vae_model=vae_model,
tokenizer=tokenizer,
vae_transform=vae_transform,
vit_transform=vit_transform,
new_token_ids=new_token_ids
)
import random
import numpy as np
seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
With that, we have set everything up! The model is loaded onto the GPU and the pipeline is ready for us to begin doing inference. In the next sections, we outline the models general capabilities as shown by the examples.inference_hyper=dict(
cfg_text_scale=4.0,
cfg_img_scale=1.0,
cfg_interval=[0.4, 1.0],
timestep_shift=3.0,
num_timesteps=50,
cfg_renorm_min=1.0,
cfg_renorm_type="global",
)
prompt = '''draw the famous actor Keanu Reeves with long hair and a beard eating Ramen noodles while sitting next to an anthromorphic bear wearing a bowtie, both laughing, drawing in anime style'''
print(prompt)
print('-' * 10)
output_dict = inferencer(text=prompt, **inference_hyper)
display(output_dict['image'])
The result can be seen below, along with a code snippet used for saving the image.from PIL import Image
import PIL
output_dict['image'].save('output_img_gen.png')
As we can see, the model does an impressive job of capturing the challenging nature of the prompt. It not only placed the subjects correctly, but understood the desired style and expressions we described in the prompt. While it stands to see how BAGEL holds up to larger VLMs like GPT-4o or dedicated image generation models like FLUX.1 Pro is unclear, but we are so far impressed with the results.inference_hyper=dict(
max_think_token_n=1000,
do_sample=False,
# text_temperature=0.3,
cfg_text_scale=4.0,
cfg_img_scale=1.0,
cfg_interval=[0.4, 1.0],
timestep_shift=3.0,
num_timesteps=50,
cfg_renorm_min=1.0,
cfg_renorm_type="global",
)
prompt = '''draw the famous actor Keanu Reeves with long hair and a beard eating Ramen noodles while sitting next to an anthromorphic bear wearing a bowtie, Keanu is wearing a shirt that says "DigitalOean", both are laughing, drawing in anime style'''
print(prompt)
print('-' * 10)
output_dict = inferencer(text=prompt, think=True, **inference_hyper)
print(output_dict['text'])
display(output_dict['image'])
This produces the sample thinking text and the subsequent generated image. The edited prompt seems to have become the following:inference_hyper=dict(
cfg_text_scale=4.0,
cfg_img_scale=2.0,
cfg_interval=[0.0, 1.0],
timestep_shift=4.0,
num_timesteps=50,
cfg_renorm_min=1.0,
cfg_renorm_type="text_channel",
)
image = Image.open('./output_img_gen_think.png')
prompt = 'make his shirt say "DIGITALOCEAN"'
display(image)
print(prompt)
print('-'*10)
output_dict = inferencer(image=image, text=prompt, **inference_hyper)
display(output_dict['image'])
This will run the pipeline. For our example, we used the image we generated from the second task. We instructed the model to edit the original image to have “DigitalOcean” written onto the human character’s shirt. We can view the example below.inference_hyper=dict(
max_think_token_n=1000,
do_sample=False,
# text_temperature=0.3,
cfg_text_scale=4.0,
cfg_img_scale=2.0,
cfg_interval=[0.4, 1.0],
timestep_shift=3.0,
num_timesteps=50,
cfg_renorm_min=0.0,
cfg_renorm_type="text_channel",
)
image = Image.open('./test_images/octupusy.jpg')
prompt = 'Could you display the sculpture that takes after this design?'
display(image)
print('-'*10)
output_dict = inferencer(image=image, text=prompt, think=True, **inference_hyper)
print(output_dict['text'])
display(output_dict['image'])
After running the example, we should be left with something approximating the image below:inference_hyper=dict(
max_think_token_n=1000,
do_sample=False,
# text_temperature=0.3,
)
image = Image.open('./test_images/meme.jpg')
prompt = "Can someone explain what’s funny about this meme??"
display(image)
print(prompt)
print('-'*10)
output_dict = inferencer(image=image, text=prompt, understanding_output=True, **inference_hyper)
print(output_dict['text'])
This provided us with the following output to the shown input image:CREATE TABLE table_name (
column1_name data_type PRIMARY KEY,
column2_name data_type,
...
);
Role of a Primary Key:id
column serves as the primary key, ensuring each user has a unique identifier.CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255)
);
CREATE TABLE table_name (
column1_name data_type,
column2_name data_type,
...
);
Command | Syntax | Description | Example |
---|---|---|---|
CREATE DATABASE |
CREATE DATABASE database_name; |
Creates a new database | CREATE DATABASE mydatabase; |
USE |
USE database_name; |
Selects the database to use for the current session | USE mydatabase; |
CREATE TABLE |
CREATE TABLE table_name (column1_name data_type, column2_name data_type, ...); |
Creates a new table in the database | CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR(255), email VARCHAR(255)); |
INSERT INTO |
INSERT INTO table_name (column1_name, column2_name, ...) VALUES (value1, value2, ...); |
Inserts new records into a table | INSERT INTO users (name, email) VALUES ('John Doe', '<john@example.com>'); |
SELECT |
SELECT column1_name, column2_name, ... FROM table_name; |
Retrieves data from a database table | SELECT * FROM users; |
UPDATE |
UPDATE table_name SET column1_name = value1, column2_name = value2, ... WHERE condition; |
Updates existing records in a table | UPDATE users SET name = 'Jane Doe' WHERE id = 1; |
REPLACE |
REPLACE INTO table_name (column1_name, column2_name, ...) VALUES (value1, value2, ...); |
Inserts new records into a table, or replaces existing records if a unique key constraint is violated | REPLACE INTO users (id, name, email) VALUES (1, 'Jane Doe', 'jane.doe@example.com'); |
DROP TABLE |
DROP TABLE IF EXISTS table_name; |
Deletes a table from the database | DROP TABLE IF EXISTS users; |
DROP DATABASE |
DROP DATABASE IF EXISTS database_name; |
Deletes a database | DROP DATABASE IF EXISTS mydatabase; |
CREATE DATABASE
statement, followed by the name of the database you want to create. In this example, we’re creating a database named mydatabase
.CREATE DATABASE mydatabase;
Once the database is created, you need to switch to it using the USE
statement. This ensures that any subsequent operations are performed within the context of the newly created database.USE mydatabase;
By executing these two statements, you have successfully created a new database and set it as the active database for your current session.CREATE TABLE
statement followed by the name of the table you want to create. In this example, we’re creating a table named users
. The table definition is enclosed in parentheses and consists of four columns: id
, name
, email
, and registration_date
.CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100),
email VARCHAR(255) UNIQUE,
registration_date DATE
);
id
: This column is defined as an integer (INT
) and is set as the primary key of the table using PRIMARY KEY
. The AUTO_INCREMENT
attribute means that each time a new record is inserted into the table, the value of id
will automatically increase by 1, starting from 1. This ensures that each record has a unique identifier.name
and email
: These columns are defined as variable-length strings using VARCHAR
. The number in parentheses specifies the maximum length of the string that can be stored in each column. For name
, the maximum length is 100 characters, and for email
, it’s 255 characters. The UNIQUE
attribute for email
ensures that each email address in the table is unique and cannot be duplicated.registration_date
: This column is defined as a DATE
type, which is used to store dates. It will hold the date when each user registered.CREATE TABLE
statement, you have successfully created a new table named users
with the specified columns and their properties.users
table, you will use the INSERT INTO
statement. This statement is followed by the table name, users
, and the columns where you want to insert data, which are name
, email
, and registration_date
. The VALUES
keyword is then used to specify the actual values to be inserted into these columns.name
: ‘John Doe’email
: ‘john@example.com’registration_date
: ‘2025-01-10’INSERT INTO users (name, email, registration_date)
VALUES ('John Doe', 'john@example.com', '2025-01-10');
By executing this statement, a new record will be added to the users
table with the specified values.INSERT INTO
statement with multiple VALUES
clauses can be more efficient than executing separate INSERT INTO
statements for each row. This approach reduces the number of database interactions, which can improve performance and reduce the load on the database server.users
table in a single statement:INSERT INTO users (name, email, registration_date)
VALUES
('Jane Smith', 'jane@example.com', '2025-01-11'),
('Emily Johnson', 'emily@example.com', '2025-01-12');
In this example, two rows are inserted into the users
table with a single INSERT INTO
statement. The VALUES
clause is repeated for each row, separated by commas. This approach allows you to insert multiple rows in a single operation, making it more efficient than executing separate INSERT INTO
statements for each row.users
table, it’s essential to verify that the data has been successfully inserted and is accurate. This step ensures that the data is consistent with the expected output and helps in identifying any potential issues early on.SELECT
statement to retrieve all the records from the users
table. The SELECT *
syntax retrieves all columns (*
) from the specified table (users
). This allows you to view the entire dataset and confirm that the expected data is present.SELECT * FROM users;
When you execute this statement, you should see the following output:Output+----+------------+-------------------+----------------+
| id | name | email | registration_date |
+----+------------+-------------------+----------------+
| 1 | John Doe | john@example.com | 2025-01-10 |
| 2 | Jane Smith | jane@example.com | 2025-01-11 |
| 3 | Emily Johnson | emily@example.com | 2025-01-12 |
+----+------------+-------------------+----------------+
users
table using the UPDATE
statement.UPDATE
statement followed by the table name, the SET
clause to specify the column(s) to update, and the WHERE
clause to specify the condition for which records to update. Here’s an example of how to update the email address of a user with id
equal to 1:UPDATE users SET email = 'john.doe@example.com' WHERE id = 1;
After executing the UPDATE
statement, it’s essential to verify that the data has been successfully updated. To do this, use the SELECT
statement to retrieve the updated record(s). The SELECT *
syntax retrieves all columns (*
) from the specified table (users
). This allows you to view the entire dataset and confirm that the expected data is present.SELECT * FROM users;
When you execute this statement, you should see the following output, indicating that the email address of the user with id
equal to 1 has been successfully updated:Output+----+------------+-------------------+----------------+
| id | name | email | registration_date |
+----+------------+-------------------+----------------+
| 1 | John Doe | john.doe@example.com | 2025-01-10 |
| 2 | Jane Smith | jane@example.com | 2025-01-11 |
| 3 | Emily Johnson | emily@example.com | 2025-01-12 |
+----+------------+-------------------+----------------+
<?php
// Assuming $conn is a valid MySQL connection
if (isset($_POST['register'])) {
$name = $_POST['name'];
$email = $_POST['email'];
$password = $_POST['password']; // Assuming password is hashed for security
$query = "INSERT INTO users (name, email, password) VALUES (?, ?, ?)";
$stmt = $conn->prepare($query);
$stmt->bind_param("sss", $name, $email, $password);
$stmt->execute();
$stmt->close();
}
?>
You can refer to this tutorial on how to install LAMP stack on Ubuntu to learn how to install PHP and MySQL on Ubuntu.const mysql = require('mysql');
// Assuming db is a valid MySQL connection
const insertCustomerInteraction = (customerID, interactionType, interactionDate) => {
const query = "INSERT INTO customer_interactions (customer_id, interaction_type, interaction_date) VALUES (?, ?, ?)";
db.query(query, [customerID, interactionType, interactionDate], (error, results, fields) => {
if (error) throw error;
console.log('Customer interaction inserted successfully');
});
};
You can also refer to this tutorial on how to install Linux, Nginx, MySQL, PHP (LEMP stack) on Ubuntu to learn how to install Node.js and MySQL on Ubuntu.IF NOT EXISTS
clause in your CREATE TABLE
statement. Here’s an example:CREATE TABLE IF NOT EXISTS users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL
);
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
age VARCHAR(3) NOT NULL // Incorrect data type for age, should be INT
);
INSERT INTO users (name, email, age) VALUES ('John Doe', 'john.doe@example.com', 'twenty-five'); // This will result in an error due to incorrect data type for age
Corrected example:CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
age INT NOT NULL // Correct data type for age
);
INSERT INTO users (name, email, age) VALUES ('John Doe', 'john.doe@example.com', 25); // Correct insertion with the right data type for age
INSERT INTO users (name, email, age VALUES ('John Doe', 'john.doe@example.com', 25); // Missing closing parenthesis
Corrected example:INSERT INTO users (name, email, age) VALUES ('John Doe', 'john.doe@example.com', 25); // Correctly formatted SQL statement
INSERT
, INSERT IGNORE
, and REPLACE
INSERT
, INSERT IGNORE
, and REPLACE
statements. Each of these statements serves a unique purpose in managing data insertion into tables. Here’s a detailed explanation of each statement, along with examples and a comparison table at the end.INSERT
statement inserts a new row into a table. If the row already exists, it will throw an error. This is the most common method of inserting data into a table.INSERT INTO users (name, email) VALUES ('John Doe', 'john.doe@example.com');
INSERT IGNORE
is similar to INSERT
, but it ignores the error if the row already exists. This can be useful when you want to insert a row only if it doesn’t already exist. If the row already exists, the statement will silently ignore the insertion attempt.INSERT IGNORE INTO users (name, email) VALUES ('John Doe', 'john.doe@example.com');
REPLACE
works similarly to INSERT
, but if the row already exists, it replaces the existing row with the new data. This statement is particularly useful when you need to update existing data or ensure that duplicate rows are not inserted.REPLACE INTO users (name, email) VALUES ('John Doe', 'john.doe@example.com');
INSERT
, INSERT IGNORE
, and REPLACE
:Statement | Behavior if Row Exists | Error Handling |
---|---|---|
INSERT | Throws an error | Raises an error |
INSERT IGNORE | Ignores the insertion | Silently ignores the error |
REPLACE | Replaces the existing row | Raises an error if the row does not exist |
INSERT
when you want to ensure that a row is inserted only if it doesn’t already exist, and you want to handle errors explicitly.INSERT IGNORE
when you want to insert a row only if it doesn’t already exist, and you don’t care about handling errors.REPLACE
when you want to ensure that a row is inserted or updated if it already exists, and you want to handle errors explicitly.mysqli
or PDO
to prepare and execute statements. Here’s a concise example using mysqli
:<?php
// Prepare an SQL statement to insert a new user into the 'users' table
$stmt = $conn->prepare("INSERT INTO users (name, email) VALUES (?, ?)");
// Bind the parameters to the SQL statement, specifying the types of the variables
$stmt->bind_param("ss", $name, $email);
// Assign values to the variables
$name = 'Jane Doe';
$email = 'jane.doe@example.com';
// Execute the prepared statement
$stmt->execute();
// Close the prepared statement
$stmt->close();
?>
For in-depth information on prepared statements, please refer to the How To Use Stored Procedures in MySQL and PHP documentation for mysqli
or PDO
usage.CREATE TABLE users (
name VARCHAR(255),
email VARCHAR(255)
);
INSERT INTO users (name, email) VALUES ('John Doe', 'john.doe@example.com'), ('Jane Doe', 'jane.doe@example.com');
CHAR
and VARCHAR
in MySQL?CHAR
and VARCHAR
are both character data types in MySQL, but they differ in how they store and handle data.CHAR
is a fixed-length string that always occupies the same space, padding with spaces if necessary. For example, CHAR(10)
will always store 10 characters, even if the actual data is shorter.VARCHAR
, on the other hand, is a variable-length string that only occupies the space needed to store the actual data. It’s more efficient for storing strings of varying lengths.CHAR
and VARCHAR
:CREATE TABLE users (
name CHAR(10),
email VARCHAR(255)
);
CREATE TABLE new_users SELECT * FROM users;
This will create a new table new_users
with the same structure and data as the users
table.CREATE DATABASE mydatabase;
This will create a new database named mydatabase
.Model | Training data | Max duration (batches) | Ratio params / 125M params | Ratio batches / 125M batches | Evaluation interval (batches) | No. nodes | Actual runtime (wallclock) | Actual runtime (s) | Ratio runtime / 125M runtime | Throughput (tokens/s) | Model FLOPS utilization (MFU) | Memory per GPU (from 82GB) | Checkpoint size | Conversion, inference & evaluation ok? | Evaluation accuracy |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MPT-125M | C4 | 4800 | 1 | 1 | 1000 | 8 | 9m7.873s | 547.9 | 1 | 6,589,902 | ~0.1 | 13.4 | 1.5G | Y | 0.53 |
MPT-350M | C4 | 13400 | 2.8 | 2.8 | 1000 | 8 | 38m10.823s | 2291 | 4.18 | 3,351,644 | ~0.145 | 8.91 | 4.0G | Y | 0.56 |
MPT-760M | C4 | 29000 | 6.08 | 6.0 | 2000 | 8 | 103m23.136s | 6203 | 11.32 | 2,737,276 | ~0.27 | 12.5 | 8.6G | Y | 0.56 |
MPT-1B | C4 | 24800 | 8 | 5.2 | 2000 | 8 | 208m24.319s | 12504 | 22.82 | 2,368,224 | ~0.33 | 16.3 | 15G | Y | 0.58 |
Model | Finetuning data | Max training duration (epochs) | Evaluation Interval (epochs) | No. Nodes | Actual runtime (wallclock) | Actual runtime (s) | Speedup versus one node | Throughput (tokens/s) | Memory per GPU (from 82GB) | Inference & evaluation ok? | Evaluation accuracy |
---|---|---|---|---|---|---|---|---|---|---|---|
MPT-7B-Dolly-SFT | mosaicml/dolly_hhrlhf | 2 | 1 | 1 | 78m28.121s | 4708 | - | 7124 | 24.9 | Y | 0.85 |
MPT-7B-Dolly-SFT | mosaicml/dolly_hhrlhf | 2 | 1 | 2 | 29m24.485s | 1764 | 2.67x | 13,844 | 19.9 | Y | 0.84 |
MPT-7B-Dolly-SFT | mosaicml/dolly_hhrlhf | 2 | 1 | 4 | 18m21.026s | 1101 | 4.28x | 28,959 | 17.5 | Y | 0.84 |
MPT-7B-Dolly-SFT | mosaicml/dolly_hhrlhf | 2 | 1 | 8 | 13m35.352s | 815 | 5.77x | 50,708 | 9.37 | Y | 0.84 |
MPT-30B-Instruct | kowndinya23/instruct-v3 | 2 | 1 | 8 | 125m12.579s | 7513 | 3.76x | 52,022 | ~36 | Y | 0.85 |
mpirun \
-H hostfile \
-np 128 \
-N 8 \
--allow-run-as-root \
-x NCCL_IB_PCI_RELAXED_ORDERING=1 \
-x NCCL_IB_CUDA_SUPPORT=1 \
-x NCCL_IB_HCA^=mlx5_1,mlx5_2,mlx5_7,mlx5_8 \
-x NCCL_CROSS_NIC=0 -x NCCL_IB_GID_INDEX=1 \
$(pwd)/nccl-tests/build/all_reduce_perf -b 8 -e 8G -f 2 -g 1
The results are tabulated below for the case of 16 nodes.Size (B) | Count (elements) | Type | Redop | Root | Out-of-place | In-place | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Time (us) | Algbw (GB/s) | Busbw (GB/s) | #Wrong | Time (us) | Algbw (GB/s) | Busbw (GB/s) | #Wrong | |||||
8 | 2 | float | sum | -1 | 63.25 | 0.00 | 0.00 | 0 | 65.28 | 0.00 | 0.00 | 0 |
16 | 4 | float | sum | -1 | 63.10 | 0.00 | 0.00 | 0 | 62.37 | 0.00 | 0.00 | 0 |
32 | 8 | float | sum | -1 | 62.90 | 0.00 | 0.00 | 0 | 63.54 | 0.00 | 0.00 | 0 |
64 | 16 | float | sum | -1 | 63.23 | 0.00 | 0.00 | 0 | 63.40 | 0.00 | 0.00 | 0 |
128 | 32 | float | sum | -1 | 64.08 | 0.00 | 0.00 | 0 | 63.23 | 0.00 | 0.00 | 0 |
256 | 64 | float | sum | -1 | 63.81 | 0.00 | 0.01 | 0 | 63.33 | 0.00 | 0.01 | 0 |
512 | 128 | float | sum | -1 | 67.62 | 0.01 | 0.02 | 0 | 66.06 | 0.01 | 0.02 | 0 |
1024 | 256 | float | sum | -1 | 71.55 | 0.01 | 0.03 | 0 | 70.99 | 0.01 | 0.03 | 0 |
2048 | 512 | float | sum | -1 | 76.07 | 0.03 | 0.05 | 0 | 74.32 | 0.03 | 0.05 | 0 |
4096 | 1024 | float | sum | -1 | 75.73 | 0.05 | 0.11 | 0 | 76.28 | 0.05 | 0.11 | 0 |
8192 | 2048 | float | sum | -1 | 77.84 | 0.11 | 0.21 | 0 | 75.27 | 0.11 | 0.22 | 0 |
16384 | 4096 | float | sum | -1 | 78.70 | 0.21 | 0.41 | 0 | 75.98 | 0.22 | 0.43 | 0 |
32768 | 8192 | float | sum | -1 | 81.08 | 0.40 | 0.80 | 0 | 76.56 | 0.43 | 0.85 | 0 |
65536 | 16384 | float | sum | -1 | 80.14 | 0.82 | 1.62 | 0 | 77.50 | 0.85 | 1.68 | 0 |
131072 | 32768 | float | sum | -1 | 91.96 | 1.43 | 2.83 | 0 | 95.47 | 1.37 | 2.72 | 0 |
262144 | 65536 | float | sum | -1 | 108.5 | 2.42 | 4.79 | 0 | 106.5 | 2.46 | 4.88 | 0 |
524288 | 131072 | float | sum | -1 | 113.9 | 4.60 | 9.13 | 0 | 113.6 | 4.62 | 9.16 | 0 |
1048576 | 262144 | float | sum | -1 | 122.6 | 8.55 | 16.97 | 0 | 121.3 | 8.64 | 17.15 | 0 |
2097152 | 524288 | float | sum | -1 | 140.5 | 14.92 | 29.61 | 0 | 140.8 | 14.89 | 29.55 | 0 |
4194304 | 1048576 | float | sum | -1 | 179.8 | 23.33 | 46.29 | 0 | 178.8 | 23.45 | 46.54 | 0 |
8388608 | 2097152 | float | sum | -1 | 241.4 | 34.75 | 68.96 | 0 | 239.9 | 34.96 | 69.38 | 0 |
16777216 | 4194304 | float | sum | -1 | 343.9 | 48.78 | 96.80 | 0 | 343.0 | 48.92 | 97.07 | 0 |
33554432 | 8388608 | float | sum | -1 | 548.5 | 61.18 | 121.40 | 0 | 550.1 | 61.00 | 121.04 | 0 |
67108864 | 16777216 | float | sum | -1 | 943.5 | 71.13 | 141.15 | 0 | 940.8 | 71.33 | 141.55 | 0 |
134217728 | 33554432 | float | sum | -1 | 1490.7 | 90.04 | 178.67 | 0 | 1489.5 | 90.11 | 178.81 | 0 |
268435456 | 67108864 | float | sum | -1 | 2547.9 | 105.36 | 209.07 | 0 | 2549.8 | 105.28 | 208.91 | 0 |
536870912 | 134217728 | float | sum | -1 | 4241.8 | 126.57 | 251.16 | 0 | 4248.9 | 126.35 | 250.73 | 0 |
1073741824 | 268435456 | float | sum | -1 | 6753.1 | 159.00 | 315.52 | 0 | 6739.1 | 159.33 | 316.17 | 0 |
2147483648 | 536870912 | float | sum | -1 | 12466 | 172.26 | 341.83 | 0 | 12383 | 173.43 | 344.14 | 0 |
4294967296 | 1073741824 | float | sum | -1 | 23774 | 180.65 | 358.49 | 0 | 23871 | 179.93 | 357.04 | 0 |
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score
# Load dataset and split into train/validation sets
X, y = load_breast_cancer(return_X_y=True)
X_train, X_val, y_train, y_val = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# Train a baseline SVM model with default hyperparameters
model = SVC(kernel='rbf', probability=True, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_val)
# Evaluate baseline performance
print("Baseline accuracy:", accuracy_score(y_val, y_pred))
print("Baseline F1-score:", f1_score(y_val, y_pred))
print("Baseline AUC:", roc_auc_score(y_val, model.predict_proba(X_val)[:, 1]))
Output:Baseline accuracy: 0.9298245614035088
Baseline F1-score: 0.9459459459459459
Baseline AUC: 0.9695767195767195
The model demonstrates a 92% validation classification accuracy using default parameters, while the F1-score (~0.94) reflects a good precision-recall balance. AUC (~0.96) shows excellent ranking performances. You can use these metrics as the baseline for comparison.for C in [0.01, 0.1, 1, 10, 100]:
model = SVC(kernel='rbf', C=C, probability=True, random_state=42)
model.fit(X_train, y_train)
val_ac = accuracy_score(y_val, model.predict(X_val))
print(f"C = {C:<5} | Validation Accuracy = {val_ac:.3f}")
Output:C = 0.01 | Validation Accuracy = 0.842
C = 0.1 | Validation Accuracy = 0.912
C = 1 | Validation Accuracy = 0.930
C = 10 | Validation Accuracy = 0.930
C = 100 | Validation Accuracy = 0.947
Low C (e.g., 0.01) results in under-regularization → underfitting (low accuracy).for gamma in [1e-4, 1e-3, 1e-2, 0.1, 1]:
model = SVC(kernel='rbf', C=1, gamma=gamma, probability=True, random_state=42)
model.fit(X_train, y_train)
val_ac = accuracy_score(y_val, model.predict(X_val))
print(f"gamma = {gamma:<6} | Validation Accuracy = {val_ac:.3f}")
Output:gamma = 0.0001 | Validation Accuracy = 0.930
gamma = 0.001 | Validation Accuracy = 0.895
gamma = 0.01 | Validation Accuracy = 0.640
gamma = 0.1 | Validation Accuracy = 0.632
gamma = 1 | Validation Accuracy = 0.632
γ = 1e-4: Low γ values produce straightforward decision boundaries that prevent overfitting while delivering the highest accuracy of 0.930.
γ ≥ 1e-3: As γ increases, the RBF kernel becomes more sensitive to individual points and risks overfitting, which reduces generalization capability as shown by the significant accuracy drop to ~0.895 and below.from sklearn.metrics import accuracy_score, f1_score, roc_auc_score
param_grid = {
"C": [0.1, 1, 10, 50],
"gamma": [1e-4, 1e-3, 0.01, 0.1]
}
best_ac = 0.0
best_f1 = 0.0
best_auc = 0.0
best_params = {}
for C in param_grid["C"]:
for gamma in param_grid["gamma"]:
model = SVC(kernel='rbf',
C=C,
gamma=gamma,
probability=True,
random_state=42)
model.fit(X_train, y_train)
# Predictions and probabilities
y_v_pred = model.predict(X_val)
y_v_proba = model.predict_proba(X_val)[:, 1]
# metrics computation
ac = accuracy_score(y_val, y_v_pred)
f1 = f1_score(y_val, y_v_pred)
auc = roc_auc_score(y_val, y_v_proba)
# You can Track best by accuracy or change to f1/auc as needed
if ac > best_ac:
best_ac = ac
best_f1 = f1
best_auc = auc
best_params = {"C": C, "gamma": gamma}
print(f"C={C:<4} gamma={gamma:<6} => "
f"Accuracy={ac:.3f} F1={f1:.3f} AUC={auc:.3f}")
print(
"\nBest combo:", best_params,
f"with Accuracy={best_ac:.3f}, F1={best_f1:.3f}, AUC={best_auc:.3f}"
)
Output:Best combo: {'C': 1, 'gamma': 0.0001} with Accuracy=0.930, F1=0.944, AUC=0.958
The grid search matched baseline accuracy, yet the results showed slight reductions in F1 and AUC scores. This indicates that the SVM default hyperparameters are near optimal for F1/AUC, and the manual grid search was too coarse or targeted the wrong regions.import random
import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score
def random_search_svm(X_train, y_train, X_val, y_val, ntrials=10):
"""
Run random search across C and gamma parameters for an RBF SVM model.
Monitor optimal hyperparameters for peak Accuracy, F1-score, and ROC AUC values.
"""
best = {
'accuracy': {'score': 0, 'params': {}},
'f1': {'score': 0, 'params': {}},
'auc': {'score': 0, 'params': {}}
}
for i in range(1, ntrials + 1):
# Log-uniform sampling
C = 10 ** random.uniform(-1, 2) # 0.1 to 100
gamma = 10 ** random.uniform(-5, -2) # 1e-5 to 1e-2
# model training
model = SVC(kernel='rbf', C=C, gamma=gamma,
probability=True, random_state=42)
model.fit(X_train, y_train)
# Prediction and evaluation
y_pred = model.predict(X_val)
y_proba = model.predict_proba(X_val)[:, 1]
ac = accuracy_score(y_val, y_pred)
f1 = f1_score(y_val, y_pred)
auc = roc_auc_score(y_val, y_proba)
# Print trial results
print(f"Trial {i}: C={C:.4f}, gamma={gamma:.5f} | "
f"Acc={ac:.3f}, F1={f1:.3f}, AUC={auc:.3f}")
# For each metric, we will update the best
if ac > best['accuracy']['score']:
best['accuracy'].update({'score': ac, 'params': {'C': C, 'gamma': gamma}})
if f1 > best['f1']['score']:
best['f1'].update({'score': f1, 'params': {'C': C, 'gamma': gamma}})
if auc > best['auc']['score']:
best['auc'].update({'score': auc, 'params': {'C': C, 'gamma': gamma}})
# For each metric, print summary of best hyperparameters
print("\nBest hyperparameters by metric:")
for metric, info in best.items():
params = info['params']
score = info['score']
print(f"- {metric.capitalize()}: Score={score:.3f}, Params= C={params.get('C'):.4f}, gamma={params.get('gamma'):.5f}")
When calling the function random_search_svm(X_train, y_train, X_val, y_val, ntrials=20) to perform a more thorough search, you will get something like this:Best hyperparameters by metric:
- Accuracy: Score=0.939, Params= C=67.2419, gamma=0.00007
- F1: Score=0.951, Params= C=59.5889, gamma=0.00002
- Auc: Score=0.987, Params= C=59.5889, gamma=0.00002
Note that different runs may yield different results because of randomness in trials.
The results show which (C, γ) pairs achieve optimal performance across each key metric. This straightforward random search method reliably finds hyperparameter configurations that outperform existing baselines and coarse grid search results.from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score
import numpy as np
# 1. data loading and spliting into train+val vs. test
X, y = load_breast_cancer(return_X_y=True)
X_tem, X_test, y_tem, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# 2.Spliting X_tem into training and validation sets. We want to reproduce the previous split
X_train, X_val, y_train, y_val = train_test_split(
X_tem, y_tem, test_size=0.25, random_state=42, stratify=y_tem
)
# Note that 0.25 of 80% give us 20% validation, that matches our original 80/20 split
# 3. We merge train and validation data for our final testing
X_merged = np.vstack([X_train, X_val])
y_merged = np.hstack([y_train, y_val])
# 4. Retrain using our best hyperparameters
best_C = 59.5889 # replace with your chosen C
best_gamma = 2e-05 # replace with your chosen gamma
f_model = SVC(
kernel='rbf',
C=best_C,
gamma=best_gamma,
probability=True,
random_state=42
)
f_model.fit(X_merged, y_merged)
# 5. Evaluate on hold-out test set
y_test_pred = f_model.predict(X_test)
y_test_proba = f_model.predict_proba(X_test)[:, 1]
test_ac = accuracy_score(y_test, y_test_pred)
test_f1 = f1_score(y_test, y_test_pred)
test_auc = roc_auc_score(y_test, y_test_proba)
print("Final model test accuracy: ", test_ac)
print("Final model test F1-score: ", test_f1)
print("Final model test ROC AUC: ", test_auc)
Running this block will produce performance metrics from the hold-out test set, which show how the tuned SVM model performs on new data. You can check the final accuracy, F1, and AUC scores against the baseline and grid/random search results to confirm meaningful real-world improvements from the selected configuration.from sklearn.ensemble import GradientBoostingClassifier
for lr in [0.001, 0.01, 0.1, 0.3, 0.7, 1.0]:
model = GradientBoostingClassifier(n_estimators=50, learning_rate=lr, random_state=42)
model.fit(X_train, y_train)
val_acc = accuracy_score(y_val, model.predict(X_val))
print(f"Learning rate {lr:.3f} => Validation Accuracy = {val_acc:.3f}")
Output:Learning rate 0.001 => Validation Accuracy = 0.632
Learning rate 0.010 => Validation Accuracy = 0.939
Learning rate 0.100 => Validation Accuracy = 0.947
Learning rate 0.300 => Validation Accuracy = 0.956
Learning rate 0.700 => Validation Accuracy = 0.956
Learning rate 1.000 => Validation Accuracy = 0.965
At a learning rate of 0.001, the model advances too slowly to improve performance. The model’s accuracy shows improvement once we adjust the learning rate to 0.1 or higher. When the learning rate is set between 0.3 to 1.0, we achieve high performance levels, reaching about 95-96% validation accuracy. The highest validation accuracy is observed at a learning rate of 1.0.Best Practice | Description |
---|---|
Coarse-to-Fine Search | Begin with a wide, log-scale range (very low to very high). Identify the region with best performance, then “zoom in” with smaller increments around that region. |
One Change at a Time | Vary only a single hyperparameter per experiment. This isolation makes it clear which parameter caused any observed performance change. |
Keep a Log | Record each hyperparameter setting and its results (e.g., in a notebook or printed output). This prevents duplicate trials and helps spot trends over time. |
Use Validation Effectively | Always evaluate on held-out data. If data is limited, apply k-fold cross-validation. This ensures improvements reflect true generalization rather than fitting noise. |
Mind Interactions | After isolating individual effects, explore combinations (e.g., learning rate + batch size, number of trees + learning rate). Hyperparameters often interact, and the optimal pair may differ from the individual optima. |
Don’t Tune Too Many at Once | Focus on 2–3 key hyperparameters, leaving less critical ones at default values. |
Stop When Returns Diminish | Set a target improvement (e.g., +2% accuracy). If further trials yield only marginal gains, conclude manual tuning or switch to automated methods for finer optimization. |
apt install python3-pip python3.10-venv
git clone https://github.com/River-Zhang/ICEdit
cd ICEdit
pip3 install -r requirements.txt
pip3 install -U huggingface_hub
huggingface-cli login
You will be prompted for your token at this step, paste it and follow the instructions on screen.python3 scripts/gradio_demo.py --share
There will be a share link which you can access from your browser.Android Weekly 是一份专注于 Android 技术生态的周刊,每周一更新。本周刊深入挖掘 Android 系统架构、性能优化、跨平台开发、AI 等领域的高质量技术内容,为开发者提供持续的知识更新与技术洞察。
订阅渠道:[微信公众号] | [知乎专栏] | [掘金] | [RSS]