<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>RossPedia</title>
        <link>https://ross.selfcoding.cn/</link>
        <description></description>
        <lastBuildDate>Tue, 07 Mar 2023 14:24:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>zh-CN</language>
        <copyright>All rights reserved 2023, Ross</copyright>
        <item>
            <title><![CDATA[Seaborn 画时序数据图]]></title>
            <link>https://ross.selfcoding.cn/article/seaborn</link>
            <guid>c5574014-fe63-40bc-94d3-77a4c8db4684</guid>
            <pubDate>Thu, 02 Mar 2023 00:00:00 GMT</pubDate>
            <description><![CDATA[seaborn时序数据可视化]]></description>
            <content:encoded><![CDATA[<div id="container" class="max-w-5xl font-medium mx-auto undefined"><main class="notion light-mode notion-page notion-block-c5574014fe6340bc94d377a4c8db4684"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-text notion-block-d274a9642ac548728b4d55dd9052546f">数据可视化在机器学习中是一项关键技术，它可以帮助我们更好地理解数据和模型的行为，并支持我们在模型选择、调整和解释过程中做出更加明智的决策。</div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-0964607c00a849cdb157693c996c3826" data-id="0964607c00a849cdb157693c996c3826"><span><div id="0964607c00a849cdb157693c996c3826" class="notion-header-anchor"></div><a class="notion-hash-link" href="#0964607c00a849cdb157693c996c3826" title="时序数据可视化"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">时序数据可视化</span></span></h2><div class="notion-text notion-block-6c2dc47176c44c5caefef08e40e2dd8d">时序数据比较常用的是二维的折线图，下面用 seaborn 折线图可视化。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-c2271e1e9ac7412d93eee80ee2a95c4d"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:500px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F3bea6a56-a1d9-4e79-95da-59a35cf2e1f7%2FUntitled.png%3FX-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Content-Sha256%3DUNSIGNED-PAYLOAD%26X-Amz-Credential%3DAKIAT73L2G45EIPT3X45%252F20230307%252Fus-west-2%252Fs3%252Faws4_request%26X-Amz-Date%3D20230307T142358Z%26X-Amz-Expires%3D86400%26X-Amz-Signature%3Dba2e12d4aa4524a79ccc78c24efcbac37ad0a242a01a051eb026fbbfb0a052f6%26X-Amz-SignedHeaders%3Dhost%26x-id%3DGetObject?table=block&amp;id=c2271e1e-9ac7-412d-93ee-e80ee2a95c4d" alt="notion image" loading="lazy" decoding="async"/></div></figure><pre class="notion-code"><div class="notion-code-copy"><div class="notion-code-copy-button"><svg fill="currentColor" viewBox="0 0 16 16" width="1em" version="1.1"><path fill-rule="evenodd" d="M0 6.75C0 5.784.784 5 1.75 5h1.5a.75.75 0 010 1.5h-1.5a.25.25 0 00-.25.25v7.5c0 .138.112.25.25.25h7.5a.25.25 0 00.25-.25v-1.5a.75.75 0 011.5 0v1.5A1.75 1.75 0 019.25 16h-7.5A1.75 1.75 0 010 14.25v-7.5z"></path><path fill-rule="evenodd" d="M5 1.75C5 .784 5.784 0 6.75 0h7.5C15.216 0 16 .784 16 1.75v7.5A1.75 1.75 0 0114.25 11h-7.5A1.75 1.75 0 015 9.25v-7.5zm1.75-.25a.25.25 0 00-.25.25v7.5c0 .138.112.25.25.25h7.5a.25.25 0 00.25-.25v-7.5a.25.25 0 00-.25-.25h-7.5z"></path></svg></div></div><code class="language-python">from typing import List
import seaborn as sns
import matplotlib.pyplot as plt


def plot_lines(x: List[float], y1: List[float], y2: List[float],
               y3: List[float], show: bool = False, save_path: str = None):
    &quot;&quot;&quot;
    画图，双Y轴
    :param y1:
    :param y2:
    :param y3:
    :param show:
    :param save_path:
    :return:
    &quot;&quot;&quot;
    sns.set_style(&#x27;darkgrid&#x27;)
    assert len(y1) == len(y2) == len(y3)
    fig, ax = plt.subplots(figsize=(10, 10))
    sns.lineplot(x=x, y=y1, label=&#x27;y1&#x27;, color=&#x27;green&#x27;, ax=ax)
    sns.lineplot(x=x, y=y2, label=&#x27;y2&#x27;, color=&#x27;blue&#x27;, ax=ax)

    ax.hlines(1.0, x[0], x[-1], colors=&quot;black&quot;, linestyles=&quot;dashed&quot;)
    ax2 = ax.twinx()
    sns.lineplot(x=x, y=y3, label=&#x27;y3&#x27;, color=&#x27;red&#x27;, ax=ax2)
    lines_1, labels_1 = ax.get_legend_handles_labels()
    lines_2, labels_2 = ax2.get_legend_handles_labels()

    lines = lines_1 + lines_2
    labels = labels_1 + labels_2
    ax2.legend().remove()
    ax.legend(lines, labels, loc=&#x27;upper left&#x27;)

    ax.set_xlabel(&#x27;X&#x27;)
    ax.set_ylabel(&#x27;Y_left&#x27;)
		ax2.set_ylabel(&#x27;Y_right&#x27;)
    ax.set_title(&#x27;Title&#x27;)
    if show:
        print(&#x27;showing plot...&#x27;)
        fig.show()
    if save_path is not None:
        fig.savefig(save_path, bbox_inches=&#x27;tight&#x27;)


if __name__ == &#x27;__main__&#x27;:
    import numpy as np

    n_sample = 100
    x = list(range(n_sample))
    y1 = np.random.random(n_sample)
    y1 = np.cumsum(y1)
    y2 = np.random.random(n_sample)
    y2 = np.cumsum(y2)
    y3 = list(range(n_sample))
    plot_lines(x, y1, y2, y3, show=True)</code></pre><div class="notion-text notion-block-9a77e277108e4a7a826c1f1a16c22195">代码解析：</div><ul class="notion-list notion-list-disc notion-block-12464cd4b5164a41add446aa817c063f"><li>第一步：在图上画出绿色和蓝色线，并用 <code class="notion-inline-code">hlines</code> 画一条 y = 1.0 的黑色橫虚线。</li><ul class="notion-list notion-list-disc notion-block-12464cd4b5164a41add446aa817c063f"><pre class="notion-code"><div class="notion-code-copy"><div class="notion-code-copy-button"><svg fill="currentColor" viewBox="0 0 16 16" width="1em" version="1.1"><path fill-rule="evenodd" d="M0 6.75C0 5.784.784 5 1.75 5h1.5a.75.75 0 010 1.5h-1.5a.25.25 0 00-.25.25v7.5c0 .138.112.25.25.25h7.5a.25.25 0 00.25-.25v-1.5a.75.75 0 011.5 0v1.5A1.75 1.75 0 019.25 16h-7.5A1.75 1.75 0 010 14.25v-7.5z"></path><path fill-rule="evenodd" d="M5 1.75C5 .784 5.784 0 6.75 0h7.5C15.216 0 16 .784 16 1.75v7.5A1.75 1.75 0 0114.25 11h-7.5A1.75 1.75 0 015 9.25v-7.5zm1.75-.25a.25.25 0 00-.25.25v7.5c0 .138.112.25.25.25h7.5a.25.25 0 00.25-.25v-7.5a.25.25 0 00-.25-.25h-7.5z"></path></svg></div></div><code class="language-python">fig, ax = plt.subplots(figsize=(10, 10))
sns.lineplot(x=x, y=y1, label=&#x27;y1&#x27;, color=&#x27;green&#x27;, ax=ax)
sns.lineplot(x=x, y=y2, label=&#x27;y2&#x27;, color=&#x27;blue&#x27;, ax=ax)
ax.hlines(1.0, x[0], x[-1], colors=&quot;black&quot;, linestyles=&quot;dashed&quot;)</code></pre></ul></ul><ul class="notion-list notion-list-disc notion-block-5a8cf6bdd0bb4aedb58b0284814dbf0f"><li>第二步：在图上新建一个 y 轴坐标在右边的 axe，并画线</li><ul class="notion-list notion-list-disc notion-block-5a8cf6bdd0bb4aedb58b0284814dbf0f"><pre class="notion-code"><div class="notion-code-copy"><div class="notion-code-copy-button"><svg fill="currentColor" viewBox="0 0 16 16" width="1em" version="1.1"><path fill-rule="evenodd" d="M0 6.75C0 5.784.784 5 1.75 5h1.5a.75.75 0 010 1.5h-1.5a.25.25 0 00-.25.25v7.5c0 .138.112.25.25.25h7.5a.25.25 0 00.25-.25v-1.5a.75.75 0 011.5 0v1.5A1.75 1.75 0 019.25 16h-7.5A1.75 1.75 0 010 14.25v-7.5z"></path><path fill-rule="evenodd" d="M5 1.75C5 .784 5.784 0 6.75 0h7.5C15.216 0 16 .784 16 1.75v7.5A1.75 1.75 0 0114.25 11h-7.5A1.75 1.75 0 015 9.25v-7.5zm1.75-.25a.25.25 0 00-.25.25v7.5c0 .138.112.25.25.25h7.5a.25.25 0 00.25-.25v-7.5a.25.25 0 00-.25-.25h-7.5z"></path></svg></div></div><code class="language-python">ax2 = ax.twinx()
sns.lineplot(x=x, y=y3, label=&#x27;y3&#x27;, color=&#x27;red&#x27;, ax=ax2)</code></pre></ul></ul><ul class="notion-list notion-list-disc notion-block-b1f2d93b04a24caba0c543f9c39f2421"><li>第三步：合并 ax 和 ax2 的图例</li><ul class="notion-list notion-list-disc notion-block-b1f2d93b04a24caba0c543f9c39f2421"><pre class="notion-code"><div class="notion-code-copy"><div class="notion-code-copy-button"><svg fill="currentColor" viewBox="0 0 16 16" width="1em" version="1.1"><path fill-rule="evenodd" d="M0 6.75C0 5.784.784 5 1.75 5h1.5a.75.75 0 010 1.5h-1.5a.25.25 0 00-.25.25v7.5c0 .138.112.25.25.25h7.5a.25.25 0 00.25-.25v-1.5a.75.75 0 011.5 0v1.5A1.75 1.75 0 019.25 16h-7.5A1.75 1.75 0 010 14.25v-7.5z"></path><path fill-rule="evenodd" d="M5 1.75C5 .784 5.784 0 6.75 0h7.5C15.216 0 16 .784 16 1.75v7.5A1.75 1.75 0 0114.25 11h-7.5A1.75 1.75 0 015 9.25v-7.5zm1.75-.25a.25.25 0 00-.25.25v7.5c0 .138.112.25.25.25h7.5a.25.25 0 00.25-.25v-7.5a.25.25 0 00-.25-.25h-7.5z"></path></svg></div></div><code class="language-python"># 得到两个 axe 的线和坐标
lines_1, labels_1 = ax.get_legend_handles_labels()
lines_2, labels_2 = ax2.get_legend_handles_labels()

lines = lines_1 + lines_2
labels = labels_1 + labels_2
# 把 ax2 的图例线去掉，不然左上角会重复出现 y3
ax2.legend().remove()
ax.legend(lines, labels, loc=&#x27;upper left&#x27;)</code></pre></ul></ul><div class="notion-blank notion-block-a6051f3c2d0b46b79ad698d58277b59b"> </div><div class="notion-text notion-block-473af5ae63dc49a08ed31eb4af09fe5e">如果我们把 x 轴的数据替换成 datetime 对象，则效果如下</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-70fad111c8fb4662ac82a6ae183a9d58"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:500px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F6be1422c-2e8b-43bc-abe2-d95cbb0a8a21%2FUntitled.png%3FX-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Content-Sha256%3DUNSIGNED-PAYLOAD%26X-Amz-Credential%3DAKIAT73L2G45EIPT3X45%252F20230307%252Fus-west-2%252Fs3%252Faws4_request%26X-Amz-Date%3D20230307T142358Z%26X-Amz-Expires%3D86400%26X-Amz-Signature%3D6f4fc6dcd8a0d40a1111eca7c3d827fb85a74077c9fbef59c1faaed602e6e915%26X-Amz-SignedHeaders%3Dhost%26x-id%3DGetObject?table=block&amp;id=70fad111-c8fb-4662-ac82-a6ae183a9d58" alt="notion image" loading="lazy" decoding="async"/></div></figure><pre class="notion-code"><div class="notion-code-copy"><div class="notion-code-copy-button"><svg fill="currentColor" viewBox="0 0 16 16" width="1em" version="1.1"><path fill-rule="evenodd" d="M0 6.75C0 5.784.784 5 1.75 5h1.5a.75.75 0 010 1.5h-1.5a.25.25 0 00-.25.25v7.5c0 .138.112.25.25.25h7.5a.25.25 0 00.25-.25v-1.5a.75.75 0 011.5 0v1.5A1.75 1.75 0 019.25 16h-7.5A1.75 1.75 0 010 14.25v-7.5z"></path><path fill-rule="evenodd" d="M5 1.75C5 .784 5.784 0 6.75 0h7.5C15.216 0 16 .784 16 1.75v7.5A1.75 1.75 0 0114.25 11h-7.5A1.75 1.75 0 015 9.25v-7.5zm1.75-.25a.25.25 0 00-.25.25v7.5c0 .138.112.25.25.25h7.5a.25.25 0 00.25-.25v-7.5a.25.25 0 00-.25-.25h-7.5z"></path></svg></div></div><code class="language-python">if __name__ == &#x27;__main__&#x27;:
    import numpy as np
		import datetime
    n_sample = 100

    x = list(range(n_sample))

		base = datetime.datetime.today()
    x = [base + datetime.timedelta(days=x) for x in range(n_sample)]

    y1 = np.random.random(n_sample)
    y1 = np.cumsum(y1)
    y2 = np.random.random(n_sample)
    y2 = np.cumsum(y2)
    y3 = list(range(n_sample))
    plot_lines(x, y1, y2, y3, show=True)</code></pre></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[BERTopic 介绍]]></title>
            <link>https://ross.selfcoding.cn/article/bertopic</link>
            <guid>13fcb5c3-2851-4c33-9174-cbe8bc16d7c4</guid>
            <pubDate>Fri, 02 Sep 2022 00:00:00 GMT</pubDate>
            <description><![CDATA[Bertopic 是最近社区比较热门的一个项目，利用预训练模型可以做到无监督的话题聚类。]]></description>
            <content:encoded><![CDATA[<div id="container" class="max-w-5xl font-medium mx-auto undefined"><main class="notion light-mode notion-page notion-block-13fcb5c328514c339174cbe8bc16d7c4"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><div class="notion-text notion-block-27dad708416c4972a05f601050457410">Bertopic 是最近社区比较热门的一个项目，利用预训练模型可以做到无监督的话题聚类。</div><a target="_blank" rel="noopener noreferrer" href="https://github.com/MaartenGr/BERTopic" class="notion-external notion-external-block notion-row notion-block-c5284f1d6d404b92829b423d10486ff7"><div class="notion-external-image"><svg viewBox="0 0 260 260"><g><path d="M128.00106,0 C57.3172926,0 0,57.3066942 0,128.00106 C0,184.555281 36.6761997,232.535542 87.534937,249.460899 C93.9320223,250.645779 96.280588,246.684165 96.280588,243.303333 C96.280588,240.251045 96.1618878,230.167899 96.106777,219.472176 C60.4967585,227.215235 52.9826207,204.369712 52.9826207,204.369712 C47.1599584,189.574598 38.770408,185.640538 38.770408,185.640538 C27.1568785,177.696113 39.6458206,177.859325 39.6458206,177.859325 C52.4993419,178.762293 59.267365,191.04987 59.267365,191.04987 C70.6837675,210.618423 89.2115753,204.961093 96.5158685,201.690482 C97.6647155,193.417512 100.981959,187.77078 104.642583,184.574357 C76.211799,181.33766 46.324819,170.362144 46.324819,121.315702 C46.324819,107.340889 51.3250588,95.9223682 59.5132437,86.9583937 C58.1842268,83.7344152 53.8029229,70.715562 60.7532354,53.0843636 C60.7532354,53.0843636 71.5019501,49.6441813 95.9626412,66.2049595 C106.172967,63.368876 117.123047,61.9465949 128.00106,61.8978432 C138.879073,61.9465949 149.837632,63.368876 160.067033,66.2049595 C184.49805,49.6441813 195.231926,53.0843636 195.231926,53.0843636 C202.199197,70.715562 197.815773,83.7344152 196.486756,86.9583937 C204.694018,95.9223682 209.660343,107.340889 209.660343,121.315702 C209.660343,170.478725 179.716133,181.303747 151.213281,184.472614 C155.80443,188.444828 159.895342,196.234518 159.895342,208.176593 C159.895342,225.303317 159.746968,239.087361 159.746968,243.303333 C159.746968,246.709601 162.05102,250.70089 168.53925,249.443941 C219.370432,232.499507 256,184.536204 256,128.00106 C256,57.3066942 198.691187,0 128.00106,0 Z M47.9405593,182.340212 C47.6586465,182.976105 46.6581745,183.166873 45.7467277,182.730227 C44.8183235,182.312656 44.2968914,181.445722 44.5978808,180.80771 C44.8734344,180.152739 45.876026,179.97045 46.8023103,180.409216 C47.7328342,180.826786 48.2627451,181.702199 47.9405593,182.340212 Z M54.2367892,187.958254 C53.6263318,188.524199 52.4329723,188.261363 51.6232682,187.366874 C50.7860088,186.474504 50.6291553,185.281144 51.2480912,184.70672 C51.8776254,184.140775 53.0349512,184.405731 53.8743302,185.298101 C54.7115892,186.201069 54.8748019,187.38595 54.2367892,187.958254 Z M58.5562413,195.146347 C57.7719732,195.691096 56.4895886,195.180261 55.6968417,194.042013 C54.9125733,192.903764 54.9125733,191.538713 55.713799,190.991845 C56.5086651,190.444977 57.7719732,190.936735 58.5753181,192.066505 C59.3574669,193.22383 59.3574669,194.58888 58.5562413,195.146347 Z M65.8613592,203.471174 C65.1597571,204.244846 63.6654083,204.03712 62.5716717,202.981538 C61.4524999,201.94927 61.1409122,200.484596 61.8446341,199.710926 C62.5547146,198.935137 64.0575422,199.15346 65.1597571,200.200564 C66.2704506,201.230712 66.6095936,202.705984 65.8613592,203.471174 Z M75.3025151,206.281542 C74.9930474,207.284134 73.553809,207.739857 72.1039724,207.313809 C70.6562556,206.875043 69.7087748,205.700761 70.0012857,204.687571 C70.302275,203.678621 71.7478721,203.20382 73.2083069,203.659543 C74.6539041,204.09619 75.6035048,205.261994 75.3025151,206.281542 Z M86.046947,207.473627 C86.0829806,208.529209 84.8535871,209.404622 83.3316829,209.4237 C81.8013,209.457614 80.563428,208.603398 80.5464708,207.564772 C80.5464708,206.498591 81.7483088,205.631657 83.2786917,205.606221 C84.8005962,205.576546 86.046947,206.424403 86.046947,207.473627 Z M96.6021471,207.069023 C96.7844366,208.099171 95.7267341,209.156872 94.215428,209.438785 C92.7295577,209.710099 91.3539086,209.074206 91.1652603,208.052538 C90.9808515,206.996955 92.0576306,205.939253 93.5413813,205.66582 C95.054807,205.402984 96.4092596,206.021919 96.6021471,207.069023 Z" fill="#161614"></path></g></svg></div><div class="notion-external-description"><div class="notion-external-title">BERTopic</div><div class="notion-external-subtitle"><span>MaartenGr</span><span> • </span><span>Updated <!-- -->Mar 5, 2023</span></div></div></a><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-dc18c010dbc6441596dff4b46684f2cb" data-id="dc18c010dbc6441596dff4b46684f2cb"><span><div id="dc18c010dbc6441596dff4b46684f2cb" class="notion-header-anchor"></div><a class="notion-hash-link" href="#dc18c010dbc6441596dff4b46684f2cb" title="主要流程"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">主要流程</span></span></h2><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-b6e04d3b5a7b4bbba5b1ca41eed948bc"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F077a027b-4b80-4585-96d7-a53e99722ee3%2FUntitled.png%3FX-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Content-Sha256%3DUNSIGNED-PAYLOAD%26X-Amz-Credential%3DAKIAT73L2G45EIPT3X45%252F20230307%252Fus-west-2%252Fs3%252Faws4_request%26X-Amz-Date%3D20230307T142359Z%26X-Amz-Expires%3D86400%26X-Amz-Signature%3D5618bef7e688cf9666f25bbfd95231583021e597f687b5d33f7ab9b20758b900%26X-Amz-SignedHeaders%3Dhost%26x-id%3DGetObject?table=block&amp;id=b6e04d3b-5a7b-4bbb-a5b1-ca41eed948bc" alt="notion image" loading="lazy" decoding="async"/></div></figure><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-5d236f66db70463ca5a0f2778e297539" data-id="5d236f66db70463ca5a0f2778e297539"><span><div id="5d236f66db70463ca5a0f2778e297539" class="notion-header-anchor"></div><a class="notion-hash-link" href="#5d236f66db70463ca5a0f2778e297539" title="向量提取"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">向量提取</span></span></h3><div class="notion-text notion-block-fd8d1d676da54564ab44978d191b2483">其中默认的 Transformer 为 all-MiniLM-L6-v2，如果用没 finetune 过的 BERT 效果不会那么好。建议采用SimCSE、SBERT 语义相似度模型。</div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-1fb3010eb9c14d9a87c2870038910ce9" data-id="1fb3010eb9c14d9a87c2870038910ce9"><span><div id="1fb3010eb9c14d9a87c2870038910ce9" class="notion-header-anchor"></div><a class="notion-hash-link" href="#1fb3010eb9c14d9a87c2870038910ce9" title="降维"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">降维</span></span></h3><div class="notion-text notion-block-ac837e5298844d24956076204c1532dc"><a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://zhuanlan.zhihu.com/p/352461768">umap</a> 降维在数据量大的时候可以加快聚类速度，数据量少（千级别及以下）的时候建议不用降维算法。</div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-a9867fc1b77e4dc1b3b501af1e3eab4f" data-id="a9867fc1b77e4dc1b3b501af1e3eab4f"><span><div id="a9867fc1b77e4dc1b3b501af1e3eab4f" class="notion-header-anchor"></div><a class="notion-hash-link" href="#a9867fc1b77e4dc1b3b501af1e3eab4f" title="聚类算法"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">聚类算法</span></span></h3><div class="notion-text notion-block-403d99fb90884e20aca4d1756def95ca">作者采用 HDBSCAN 原因是因为聚类效果稳定，超参少，但是笔者发现使用 HDBSCAN 聚类的精度并不高，簇内常混杂着其他主题的样本，原因见<a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://www.biaodianfu.com/hdbscan.html">机器学习聚类算法之HDBSCAN</a> 。实践得出的结论是在样本比较少的时候使用层次聚类并把目标簇的数目设置得大一些，理由很简单数据少的时候样本在空间内是比较稀疏的；但聚类样本到达一定规模的时候选择 DBSCAN 可以达到比较高的精度。</div><div class="notion-blank notion-block-0926debffd564e9a8b340f2c4423c263"> </div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[拓扑排序算法]]></title>
            <link>https://ross.selfcoding.cn/article/topology-sort</link>
            <guid>da7d7b4c-24d7-46c7-83d2-2ad8a5e55e63</guid>
            <pubDate>Sun, 28 Aug 2022 00:00:00 GMT</pubDate>
            <description><![CDATA[拓扑排序算法介绍]]></description>
            <content:encoded><![CDATA[<div id="container" class="max-w-5xl font-medium mx-auto undefined"><main class="notion light-mode notion-page notion-block-da7d7b4c24d746c783d22ad8a5e55e63"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-cc7ce08213744ef9b0f6524b151626f7" data-id="cc7ce08213744ef9b0f6524b151626f7"><span><div id="cc7ce08213744ef9b0f6524b151626f7" class="notion-header-anchor"></div><a class="notion-hash-link" href="#cc7ce08213744ef9b0f6524b151626f7" title="背景"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">背景</span></span></h2><div class="notion-text notion-block-2e74c372229249899016e6183b634981">对一个<a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://baike.baidu.com/item/%E6%9C%89%E5%90%91%E6%97%A0%E7%8E%AF%E5%9B%BE/10972513">有向无环图</a>(Directed Acyclic Graph简称DAG)G进行拓扑排序，是将G中所有顶点排成一个线性序列，使得图中任意一对顶点u和v，若边&lt;u,v&gt;∈E(G)，则u在线性序列中出现在v之前。通常，这样的线性序列称为满足拓扑次序(Topological Order)的序列，简称拓扑序列。简单的说，由某个集合上的一个<a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://baike.baidu.com/item/%E5%81%8F%E5%BA%8F/2439087">偏序</a>得到该集合上的一个<a target="_blank" rel="noopener noreferrer" class="notion-link" href="https://baike.baidu.com/item/%E5%85%A8%E5%BA%8F/10577699">全序</a>，这个操作称之为拓扑排序。</div><div class="notion-text notion-block-b352a0d8c8fa482a92306619d3e8bf88">输入：有向无环图G</div><div class="notion-text notion-block-1a5cff4595d543be8cdbb9c320b7e58b">输出：结点序列；对于所有的边&lt;u,v&gt;∈E(G)，结点u排在v前面。</div><div class="notion-text notion-block-7a729ec898e4400a8b5d573c908f0d67">相关应用：程序处理流程的排序，大学课程表学习顺序问题</div><div class="notion-blank notion-block-da43cac961934190b0e5711766495b30"> </div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-b27349f9e66e4d9e9f567a18c9cd296e"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F65e5adb9-6fab-4af8-95b4-9735e566f45b%2FUntitled.png%3FX-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Content-Sha256%3DUNSIGNED-PAYLOAD%26X-Amz-Credential%3DAKIAT73L2G45EIPT3X45%252F20230307%252Fus-west-2%252Fs3%252Faws4_request%26X-Amz-Date%3D20230307T142359Z%26X-Amz-Expires%3D86400%26X-Amz-Signature%3D329b390a1c368cc679bf71366b52d149af8b025092a223bb171813408f19e46b%26X-Amz-SignedHeaders%3Dhost%26x-id%3DGetObject?table=block&amp;id=b27349f9-e66e-4d9e-9f56-7a18c9cd296e" alt="notion image" loading="lazy" decoding="async"/></div></figure><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-2d6c5fb459174bcebf96ca081470bec9" data-id="2d6c5fb459174bcebf96ca081470bec9"><span><div id="2d6c5fb459174bcebf96ca081470bec9" class="notion-header-anchor"></div><a class="notion-hash-link" href="#2d6c5fb459174bcebf96ca081470bec9" title="原理"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">原理</span></span></h2><div class="notion-text notion-block-fd04103688c4436ea9b731c2b3bb80a8">排序原理：只要保证指向该结点的节点都被遍历过，当前节点方可访问。</div><div class="notion-text notion-block-bea3d5a253454d9da0686d67c1e75e95">如何确定结点可以访问呢？最简单的情况是，结点的入度为0，即没有其他结点指向该结点。</div><div class="notion-text notion-block-a4ee43ddb7474ebb82debe5786e40ee2">可以利用这个特性进行拓扑排序，只需要把访问过的节点和由该节点出发的边删除后，在新的图中入度为0的节点为可访问的结点。</div><div class="notion-text notion-block-154bda4d26dd4d82a2d9e1c403210c03">如上图所示，第一步可访问的结点为A和B，因为其入度为0，可以首先访问A和B，之后把结点A、B边a、b删除，得到新的图。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-89aebc00a0d146e09017d99e07007a6e"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:480px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Fsecure.notion-static.com%2Faa287658-f0d6-41a1-8de3-b9d337da2813%2FUntitled.png%3FX-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Content-Sha256%3DUNSIGNED-PAYLOAD%26X-Amz-Credential%3DAKIAT73L2G45EIPT3X45%252F20230307%252Fus-west-2%252Fs3%252Faws4_request%26X-Amz-Date%3D20230307T142359Z%26X-Amz-Expires%3D86400%26X-Amz-Signature%3D7a89f8772c6f8524e84b9305ae81cdf2db36596a907de60679df60ef8e6d1e69%26X-Amz-SignedHeaders%3Dhost%26x-id%3DGetObject?table=block&amp;id=89aebc00-a0d1-46e0-9017-d99e07007a6e" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-blank notion-block-8ff38a1310394ee3a77408a462c84bd3"> </div><div class="notion-text notion-block-18ad4a23a37f48899506398cf19e0dfb">访问C和D，之后把以结点C、D出发的边d、f、g、h删除，得到新的图。</div><div class="notion-text notion-block-73f0c22a5cd84a8abaaabec53f9bb31f">访问E、F和G，之后把以结点E、F和G出发的边i、j删除，得到新的图。</div><div class="notion-text notion-block-1bebaf13b81146d094ddb474b676ccc8">访问H，这时候图已经没有结点了，结束排序。</div><div class="notion-blank notion-block-79f7e26762c14ed2a3b54e050e487b4b"> </div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-39db993f2199416485fe8096a5f1c172" data-id="39db993f2199416485fe8096a5f1c172"><span><div id="39db993f2199416485fe8096a5f1c172" class="notion-header-anchor"></div><a class="notion-hash-link" href="#39db993f2199416485fe8096a5f1c172" title="代码实现"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">代码实现</span></span></h2><pre class="notion-code"><div class="notion-code-copy"><div class="notion-code-copy-button"><svg fill="currentColor" viewBox="0 0 16 16" width="1em" version="1.1"><path fill-rule="evenodd" d="M0 6.75C0 5.784.784 5 1.75 5h1.5a.75.75 0 010 1.5h-1.5a.25.25 0 00-.25.25v7.5c0 .138.112.25.25.25h7.5a.25.25 0 00.25-.25v-1.5a.75.75 0 011.5 0v1.5A1.75 1.75 0 019.25 16h-7.5A1.75 1.75 0 010 14.25v-7.5z"></path><path fill-rule="evenodd" d="M5 1.75C5 .784 5.784 0 6.75 0h7.5C15.216 0 16 .784 16 1.75v7.5A1.75 1.75 0 0114.25 11h-7.5A1.75 1.75 0 015 9.25v-7.5zm1.75-.25a.25.25 0 00-.25.25v7.5c0 .138.112.25.25.25h7.5a.25.25 0 00.25-.25v-7.5a.25.25 0 00-.25-.25h-7.5z"></path></svg></div></div><code class="language-python">#!/usr/bin/python3
# -*- coding: utf-8 -*-
# Created by Ross


def get_in_degreee_zero(graph, visited):
    &quot;&quot;&quot;
    get node where node&#x27;s in degress equal 0.
    &quot;&quot;&quot;
    node2in_degree = {node: 0 for node in graph.keys() if node not in visited}

    for _from, to_nodes in graph.items():
        if _from in visited:
            continue
        for to_node in to_nodes:
            node2in_degree[to_node] += 1
    return [node for node, in_degress in node2in_degree.items() if in_degress == 0]


def topological_sort(graph):
    &quot;&quot;&quot;topological sort&quot;&quot;&quot;
    result = []
    visited = set()
    in_degress_zero_nodes = get_in_degreee_zero(graph, visited)
    while in_degress_zero_nodes:
        result.extend(in_degress_zero_nodes)
        visited.update(in_degress_zero_nodes)
        in_degress_zero_nodes = get_in_degreee_zero(graph, visited)
    return result


if __name__ == &#x27;__main__&#x27;:
    g = {
        &#x27;A&#x27;: [&#x27;C&#x27;, &#x27;D&#x27;],
        &#x27;B&#x27;: [&#x27;D&#x27;],
        &#x27;C&#x27;: [&#x27;E&#x27;, &#x27;F&#x27;],
        &#x27;D&#x27;: [&#x27;F&#x27;, &#x27;G&#x27;],
        &#x27;E&#x27;: [],
        &#x27;F&#x27;: [&#x27;H&#x27;],
        &#x27;G&#x27;: [&#x27;H&#x27;],
        &#x27;H&#x27;: []
    }
    res = topological_sort(g)
    print(res)</code></pre><div class="notion-blank notion-block-39e25f6452ca4e93a3143357dce7d284"> </div></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[短视频主播 Embedding 建模]]></title>
            <link>https://ross.selfcoding.cn/article/user-embedding</link>
            <guid>7e6a4f4a-27bd-4578-a78a-e63d8c7fecce</guid>
            <pubDate>Tue, 02 Aug 2022 00:00:00 GMT</pubDate>
            <description><![CDATA[随着网络世界的发展，越来越多人开始在直播平台上分享内容。对主播进行建模是一项有趣且有挑战性的任务。在视频号中，主播的行为是复杂且多模态的，复杂体现在主播有简介、历史发步过的短视频，也有直播的信息等；而多模态体现在主播的信息包括文字、图片、视频画面、音频、标签信息等。如何利用这些复杂的信息压缩成一个 n 维的向量是具有挑战性的。]]></description>
            <content:encoded><![CDATA[<div id="container" class="max-w-5xl font-medium mx-auto undefined"><main class="notion light-mode notion-page notion-block-7e6a4f4a27bd4578a78ae63d8c7fecce"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-17396f4aea4347d896b144c2b172e84d" data-id="17396f4aea4347d896b144c2b172e84d"><span><div id="17396f4aea4347d896b144c2b172e84d" class="notion-header-anchor"></div><a class="notion-hash-link" href="#17396f4aea4347d896b144c2b172e84d" title="前言"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">前言</span></span></h2><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-6094c98a92f147c0bd1724037c15a051" data-id="6094c98a92f147c0bd1724037c15a051"><span><div id="6094c98a92f147c0bd1724037c15a051" class="notion-header-anchor"></div><a class="notion-hash-link" href="#6094c98a92f147c0bd1724037c15a051" title="背景"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">背景</span></span></h3><div class="notion-text notion-block-c50f2c7100344c53846ae0b53b9f3615">随着网络世界的发展，越来越多人开始在直播平台上分享内容。对主播进行建模是一项有趣且有挑战性的任务。在视频号中，主播的行为是复杂且多模态的，复杂体现在主播有简介、历史发步过的短视频，也有直播的信息等；而多模态体现在主播的信息包括文字、图片、视频画面、音频、标签信息等。如何利用这些复杂的信息压缩成一个 n 维的向量是具有挑战性的。</div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-7d029b3001684dbe8b8d44fc58a6c1f0" data-id="7d029b3001684dbe8b8d44fc58a6c1f0"><span><div id="7d029b3001684dbe8b8d44fc58a6c1f0" class="notion-header-anchor"></div><a class="notion-hash-link" href="#7d029b3001684dbe8b8d44fc58a6c1f0" title="应用场景"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">应用场景</span></span></h3><div class="notion-text notion-block-1470cbf0e0ff4270ba6dbbe4b033c7e6">主播 Embedding 跟其他类型的 Embedding 相似，可以应用在常见的检索场景。以下是主播 Embedding 的使用场景例子：</div><ol start="1" class="notion-list notion-list-numbered notion-block-273d910596a843db9feb505ad45609d8"><li>相似主播检索：在运营和黑产场景，想要通过一个种子主播找出其相关的主播时候使用主播Embedding是非常高效的，这个有利于运营侧快速地找出相似的主播。</li></ol><ol start="2" class="notion-list notion-list-numbered notion-block-7cf728f4216f424c9e0a14ea0b2853a8"><li>主播冷启动：推荐系统通过用户和物品的交互，来预测用为未来的行为和兴趣。但是当有新主播加入的时候，往往需要冷启动将新主播分发给最有可能对其有兴趣的用户，以产生有效的曝光。</li></ol><ol start="3" class="notion-list notion-list-numbered notion-block-abcb0be7dabb452c8a3df09fe0bed578"><li>主播打散/去重：内容相近的主播在Embedding向量空间中距离也相近，搭配聚类算法可达到相似主播去重和打散的效果。</li></ol><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-ad4dc500136a4675a8553b291b960d93" data-id="ad4dc500136a4675a8553b291b960d93"><span><div id="ad4dc500136a4675a8553b291b960d93" class="notion-header-anchor"></div><a class="notion-hash-link" href="#ad4dc500136a4675a8553b291b960d93" title="方案"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">方案</span></span></h2><div class="notion-text notion-block-252f2736b4b6443bae91d638f38bfda2">信息时代数据为王，当可以直接获得用户的点击行为的时候，训练一个主播 Embedding 不是一件困难的事情。然而，由于用户的点击行为是用户的隐私，很多情况下非业务方是无法获取到用户的点击日志的。作为替代，业务方可提供基于用户行为训练的主播 ID-Embedding，可以通过蒸馏的方式进行训练主播内容 Embedding。</div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-27ae107653f54f668665ab3279f00bfb" data-id="27ae107653f54f668665ab3279f00bfb"><span><div id="27ae107653f54f668665ab3279f00bfb" class="notion-header-anchor"></div><a class="notion-hash-link" href="#27ae107653f54f668665ab3279f00bfb" title="自监督主播 Embedding"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">自监督主播 Embedding</span></span></h3><div class="notion-text notion-block-7cae3741c3b6435da841a7fe5f45d610">在无法获取任务与用户点击相关的信息之时，而因为标注难度和数量太大无法进行人工标注训练集的时候，使用自监督方式训练是一个有效的方案。</div><div class="notion-text notion-block-32488c7ffd29449a873901cfd39c0042">近年来出现了很多自监督训练的方案，其中 BERT 最广为人知，其通过 Mask Language Model (MLM) 和 Next Sentence Prediction (NSP) 两个预训练任务来建模。MLM是通过随机 Mask 句子中的一些字，用未被 Mask 的上下文预测被 Mask 掉的词，以学习句子的表示；而 NSP 是将两个句子拼接之后，判断两个是否是后面的句子是不是前句的下一句，以学习句子之间的相似信息。语言模型在大规模无标签语料库上进行预训练以强化很多下游任务，受预训练语言模型的启发，微软近年来提出了使用用户行为构建的预训练用户模型 PTUM，提出了Masked Behavior Prediction (MBP) 和 Next K Behaviors Prediction (NBP) 两个用户 Embedding 预训练任务。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-b079cf26afac422ab409510e7bddd678"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F86befcd3-3600-469d-9334-d8f5b8ef32ba%2FUntitled.png%3FX-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Content-Sha256%3DUNSIGNED-PAYLOAD%26X-Amz-Credential%3DAKIAT73L2G45EIPT3X45%252F20230307%252Fus-west-2%252Fs3%252Faws4_request%26X-Amz-Date%3D20230307T142359Z%26X-Amz-Expires%3D86400%26X-Amz-Signature%3D4e3cb8da63f54def1c8d3be116ffd3ff2f14f92a92f148f1907545abb8e93c9a%26X-Amz-SignedHeaders%3Dhost%26x-id%3DGetObject?table=block&amp;id=b079cf26-afac-422a-b409-510e7bddd678" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-b236f65927684fbbbb7ebaec345b2a30">图1：PTUM 架构示意图</div><div class="notion-text notion-block-826381315b364dce94c3845f48361fbe">类似的，我们受到 PTUM 的启发，希望通过 Mask 的方式构造自监督的训练样本对，并通过对比学习进行无监督训练。但是由于应用场景包含多个模态，BERT / PTUM 这一类模型使用未被 MASK 的上下文预测 MASK 元素的 Embedding，在异构的输入上并不适用；因此我们利用对比学习的思想做了一些改造，将预测 MASK 元素的 Embedding 改为预测数据增强之后的主播 Embedding；数据增强采用随机 MASK 输入的方式，为了较少信息泄露，相同的主播可见信息的交集比例需要限制在较小的数值。</div><div class="notion-text notion-block-b20ec163e1784bef8cfeb43938d38db1">损失函数我们采用InfoNCE，同个 Batch 内相同的主播当做是正样本，不同主播之间是负样本。</div><div class="notion-blank notion-block-cec48894f15949f2989bd8645bd6e68a"> </div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-685d01b28e6240acae4389b939d05e00"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Fsecure.notion-static.com%2Fafe4fdb6-860a-4c44-aea5-f0a9037d95d2%2FUntitled.png%3FX-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Content-Sha256%3DUNSIGNED-PAYLOAD%26X-Amz-Credential%3DAKIAT73L2G45EIPT3X45%252F20230307%252Fus-west-2%252Fs3%252Faws4_request%26X-Amz-Date%3D20230307T142359Z%26X-Amz-Expires%3D86400%26X-Amz-Signature%3D7ddee01007c9aeb07fc01c6bed4bed08b8c872564984d755910438898c65d772%26X-Amz-SignedHeaders%3Dhost%26x-id%3DGetObject?table=block&amp;id=685d01b2-8e62-40ac-ae43-89b939d05e00" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-5bbbe999e6ab4fa291337ffc0505b871">图2：主播 Embedding 对比学习框架</div><h3 class="notion-h notion-h2 notion-h-indent-1 notion-block-0887135d71da4e8cad00d23aa8c890cb" data-id="0887135d71da4e8cad00d23aa8c890cb"><span><div id="0887135d71da4e8cad00d23aa8c890cb" class="notion-header-anchor"></div><a class="notion-hash-link" href="#0887135d71da4e8cad00d23aa8c890cb" title="将信息从 ID-Embedding 迁移到主播内容 Embedding"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">将信息从 ID-Embedding 迁移到主播内容 Embedding</span></span></h3><div class="notion-text notion-block-d08c8d3ad76b4c0dbc0bac2afd33c099">ID-Embedding 是把用户和物品，并利用点击日志进行训练；而主播内容 Embedding 是通过主播的基本信息和历史行为（历史短视频以及直播）来构造画像。由于兴趣相同的用户一般会和相似的主播进行交互，例如喜欢 NBA 的用户与篮球类的主播交互更多，因此内容相似的主播在 ID-Embedding 的空间上也更加相近，我们希望通过将有用的信息从 ID-Embedding 迁移到主播内容 Embedding。下面将介绍两类已经实践过的知识迁移方式。</div><h4 class="notion-h notion-h3 notion-h-indent-2 notion-block-62e30a2871cb4a96b380c9dd35872fe4" data-id="62e30a2871cb4a96b380c9dd35872fe4"><span><div id="62e30a2871cb4a96b380c9dd35872fe4" class="notion-header-anchor"></div><a class="notion-hash-link" href="#62e30a2871cb4a96b380c9dd35872fe4" title="基于蒸馏的知识迁移"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">基于蒸馏的知识迁移</span></span></h4><div class="notion-text notion-block-e097035fa18d438d9c210999087d958a">当只有 teacher 模型的输出（ ID-Embedding） 没有 teacher 模型参数时，下面是一些参考的蒸馏方案参考：</div><ul class="notion-list notion-list-disc notion-block-bdb9446b78314c20974e66eb99f9c373"><li>最简单的方式自然是直接把 ID-Embedding 蒸馏到 student 模型的 feature map 上。</li></ul><ul class="notion-list notion-list-disc notion-block-a1f06cdc07f442f3868f1f2fb7ecc58c"><li>在 ID-Embedding 上采用聚类提取伪标签后，student 根据伪标签用分类的方式进行训练。</li></ul><ul class="notion-list notion-list-disc notion-block-313942cc53164285ac77149ab4e1f303"><li>利用样本之间的关系进行蒸馏，比较经典的方案是 RKD loss，其利用一个 batch 内 ID-Embedding 构造相似度矩阵作为监督信息，让 student 模型学习样本之间的关系。</li></ul><div class="notion-text notion-block-efa437fa87754532b2640b22f62a6e9f">然而以上的蒸馏方式在存在两点不足：</div><ul class="notion-list notion-list-disc notion-block-b83224a465124549b2423ec9f8f75c1c"><li>抗噪能力差：如果 ID-Embedding 噪声较多，student 模型学习到的 Embedding 同样也有很多噪声。</li></ul><ul class="notion-list notion-list-disc notion-block-290834ace89048f1ba533ac8232ad6cf"><li>不能充分利用预训练模型：由于是 feature map 层垮了领域，直接蒸馏的会把预训练模型已经学习到的知识破坏，需要在蒸馏过程中加入限制条件。</li></ul><div class="notion-text notion-block-5b69d35a5bc34590bc71d6ef2b2b5cf5">综上，teacher 模型的性能上限决定了 student 模型的性能上限，在跨域的情况下蒸馏信息也会磨损。</div><h4 class="notion-h notion-h3 notion-h-indent-2 notion-block-89c032d75aaa43c0b62fa42f61b808d0" data-id="89c032d75aaa43c0b62fa42f61b808d0"><span><div id="89c032d75aaa43c0b62fa42f61b808d0" class="notion-header-anchor"></div><a class="notion-hash-link" href="#89c032d75aaa43c0b62fa42f61b808d0" title="基于对比学习的知识迁移"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">基于对比学习的知识迁移</span></span></h4><div class="notion-text notion-block-194c979c243a45c4b051bc5b03568966">为了解决直接蒸馏抗噪能力差和不好利用预训练模型的问题，我们采用基于检索构造正样本 + 对比学习进行知识迁移。其中基于检索构造正样本目的是为了数据降噪；对比学习是有监督与图2的结构一样，而正样本来自 query 样本和其相似召回的结果，负样本是 batch 内出自己之外的其他样本，整个训练过程在预训练模型上搭建，能够充分利用预训练模型的优势。</div><div class="notion-text notion-block-c45a3c25d22542bc88adfc9037128752">由于 ID-Embedding 是基于用户点击训练的，其相似性并非是内容语义相似，因此将 ID-Embedding 召回结果中不是内容语义相似的样本过滤，过滤的方式有很多种，包括计算编辑距离、利用相似模型计算相似度等方式。</div><div class="notion-text notion-block-afd041d16f114a1cb03b1bc1af50d8de">与无监督的对比学习不同的是：有监督的对比学习能够获取困难正/负样本，任务的难度更大，模型在实际场景中使用的时候更鲁棒。</div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-4bbd1bf75f6540aea49147d5841daf51"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:594px;max-width:100%;flex-direction:column"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F4ff82fd0-12df-46e2-ba13-861743d60660%2FUntitled.png%3FX-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Content-Sha256%3DUNSIGNED-PAYLOAD%26X-Amz-Credential%3DAKIAT73L2G45EIPT3X45%252F20230307%252Fus-west-2%252Fs3%252Faws4_request%26X-Amz-Date%3D20230307T142359Z%26X-Amz-Expires%3D86400%26X-Amz-Signature%3D26d6b42227f2038fdd9c68bcbe1fe90391555c75183088fbadec9ed6e0a99a2a%26X-Amz-SignedHeaders%3Dhost%26x-id%3DGetObject?table=block&amp;id=4bbd1bf7-5f65-40ae-a491-47d5841daf51" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-efa5fb7067bf44088cd21816001a5c87">在真实的业务场景中，ID-Embedding 通常噪声比较多，而对比学习的知识迁移中整个数据清洗过程相对可控，训练过程也能很好的预训练模型的优势。</div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-3ce6e35becd0489ab53dd3072ea3c1ab" data-id="3ce6e35becd0489ab53dd3072ea3c1ab"><span><div id="3ce6e35becd0489ab53dd3072ea3c1ab" class="notion-header-anchor"></div><a class="notion-hash-link" href="#3ce6e35becd0489ab53dd3072ea3c1ab" title="总结"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">总结</span></span></h2><div class="notion-text notion-block-77b42fdb44ef494590a56924bcc0b05a">本文探索了无用户点击数据的情况下构建主播 Embedding 的可能方案，实验对比了两类知识迁移的方式，基于对比学习的知识迁移的效果明显优于基于蒸馏的知识迁移，其中数据清洗的过程对最终效果影响较大。主播级别的 Embedding 可用的信息有很多，但 UGC 平台的数据的噪声也是很大，如何对数据降噪和精简显得格外重要；不仅需要处理模态缺失的问题，还要处理模态信息无意义使得召回效果不稳定的问题。未来，将探索更多的数据挖掘方式和模态交互的方案，以更好得描述用户代表的内容倾向。</div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-e4c65103d0134dbc97c0a288962c69a7" data-id="e4c65103d0134dbc97c0a288962c69a7"><span><div id="e4c65103d0134dbc97c0a288962c69a7" class="notion-header-anchor"></div><a class="notion-hash-link" href="#e4c65103d0134dbc97c0a288962c69a7" title="参考文献"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">参考文献</span></span></h2><ol start="1" class="notion-list notion-list-numbered notion-block-a03b1af8e0e9401782dc574da81673c4"><li>Wu, C., Wu, F., Qi, T., Lian, J., Huang, Y., &amp; Xie, X. (2020). PTUM: Pre-training user model from unlabeled user behaviors via self-supervision. <em>Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020</em>, 1939–1944. https://doi.org/10.18653/v1/2020.findings-emnlp.174</li></ol><ol start="2" class="notion-list notion-list-numbered notion-block-e7ab4b3f30ba4e27bff7fbbe983d8453"><li>Gao, T., Yao, X., &amp; Chen, D. (2021). SimCSE: Simple Contrastive Learning of Sentence Embeddings. <em>EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings</em>, 6894–6910. https://doi.org/10.18653/v1/2021.emnlp-main.552</li></ol><ol start="3" class="notion-list notion-list-numbered notion-block-67ac045c470947bd97b1397d59b9bce4"><li>Devlin, J., Chang, M. W., Lee, K., &amp; Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. <em>NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference</em>, <em>1</em>, 4171–4186. https://doi.org/10.48550/arxiv.1810.04805</li></ol><ol start="4" class="notion-list notion-list-numbered notion-block-01352e8e181b48e883628d91774af045"><li>Hinton, G., Vinyals, O., &amp; Dean, J. (2015). <em>Distilling the Knowledge in a Neural Network</em>. https://doi.org/10.48550/arxiv.1503.02531</li></ol><ol start="5" class="notion-list notion-list-numbered notion-block-7c091c8417524f8aa4e85ceed9ff7064"><li>Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., &amp; Bengio, Y. (2015, December 19). FitNets: Hints for thin deep nets. <em>3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings</em>. https://doi.org/10.48550/arxiv.1412.6550</li></ol><ol start="6" class="notion-list notion-list-numbered notion-block-910b8c9b304649488de4ec05cf0b08fe"><li>Park, W., Kim, D., Lu, Y., &amp; Cho, M. (2019). Relational knowledge distillation. <em>Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition</em>, <em>2019</em>-<em>June</em>, 3962–3971. https://doi.org/10.1109/CVPR.2019.00409</li></ol></main></div>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[RSI 指标简单介绍]]></title>
            <link>https://ross.selfcoding.cn/article/5a6d6b7e-da61-42f6-80c5-636f89e35f80</link>
            <guid>5a6d6b7e-da61-42f6-80c5-636f89e35f80</guid>
            <pubDate>Tue, 12 Jul 2022 00:00:00 GMT</pubDate>
            <description><![CDATA[RSI 称为相对强弱指标，用来衡量一段时间内买盘和卖盘的相对力量的强弱，数值越大表示买盘的力量更强，在技术分析用的比较多。]]></description>
            <content:encoded><![CDATA[<div id="container" class="max-w-5xl font-medium mx-auto undefined"><main class="notion light-mode notion-page notion-block-5a6d6b7eda6142f680c5636f89e35f80"><div class="notion-viewport"></div><div class="notion-collection-page-properties"></div><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-2ddf4d35b8634f86b8c120978bbe8bdc"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F05b98e49-c55d-4ef9-9def-099883f71fb4%2FUntitled.png%3FX-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Content-Sha256%3DUNSIGNED-PAYLOAD%26X-Amz-Credential%3DAKIAT73L2G45EIPT3X45%252F20230307%252Fus-west-2%252Fs3%252Faws4_request%26X-Amz-Date%3D20230307T142400Z%26X-Amz-Expires%3D86400%26X-Amz-Signature%3D9c3e6499474b508589e1fa3aab046f9946538f9ca912ebdb20d595149a4b94a8%26X-Amz-SignedHeaders%3Dhost%26x-id%3DGetObject?table=block&amp;id=2ddf4d35-b863-4f86-b8c1-20978bbe8bdc" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-e667fa49389240c8916db60cc5b0e584">RSI （Relative Strength Index , RSI）称为相对强弱指标，用来衡量一段时间内买盘和卖盘的相对力量的强弱，数值越大表示买盘的力量更强，在技术分析用的比较多。</div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-766942e4553246669e933824635f7ed5" data-id="766942e4553246669e933824635f7ed5"><span><div id="766942e4553246669e933824635f7ed5" class="notion-header-anchor"></div><a class="notion-hash-link" href="#766942e4553246669e933824635f7ed5" title="计算方法"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">计算方法</span></span></h2><div class="notion-text notion-block-e751d0e25a38435e825fdbd2ac09a9ed">其计算的方式为(一段时间内平均上涨÷这段时间内平均下跌)×100  。形式化地表示为</div><div class="notion-text notion-block-45187c42b15c4b59b8af35a47bee3946">假设在 n 天内，有a天是涨的，有 b 天是跌的，其中 n=a+b 。</div><div class="notion-text notion-block-d2be0b2ff9914791a8f8171730c57418">这段时间内的日上涨量总和为：</div><span role="button" tabindex="0" class="notion-equation notion-equation-block"><span></span></span><span role="button" tabindex="0" class="notion-equation notion-equation-block"><span></span></span><div class="notion-text notion-block-90a14b0215b546098c93ebcfcb67a05c">其中 <span role="button" tabindex="0" class="notion-equation notion-equation-inline"><span></span></span> 是第 i 天上涨量，<span role="button" tabindex="0" class="notion-equation notion-equation-inline"><span></span></span> 是指第 i 天的收盘价。</div><div class="notion-text notion-block-d063dc14c1f74aff934238496fd539c9">类似的这段时间内下跌总量为：</div><span role="button" tabindex="0" class="notion-equation notion-equation-block"><span></span></span><span role="button" tabindex="0" class="notion-equation notion-equation-block"><span></span></span><div class="notion-text notion-block-267053308eb1425391c22e8a8d72e599">RSI 表示则表示这段时间内平均上涨量和平均下跌量的比值，公式为：</div><span role="button" tabindex="0" class="notion-equation notion-equation-block"><span></span></span><span role="button" tabindex="0" class="notion-equation notion-equation-block"><span></span></span><div class="notion-text notion-block-6bffec214e404fe5bf679d41a4545706">其中 N 一般是取7、14。</div><h2 class="notion-h notion-h1 notion-h-indent-0 notion-block-6c1e757bdb15437c9b6e5385cf4db02d" data-id="6c1e757bdb15437c9b6e5385cf4db02d"><span><div id="6c1e757bdb15437c9b6e5385cf4db02d" class="notion-header-anchor"></div><a class="notion-hash-link" href="#6c1e757bdb15437c9b6e5385cf4db02d" title="特性"><svg viewBox="0 0 16 16" width="16" height="16"><path fill-rule="evenodd" d="M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z"></path></svg></a><span class="notion-h-title">特性</span></span></h2><ul class="notion-list notion-list-disc notion-block-fe056dcd595d4256bb1bccea5a38d98d"><li>RSI 指标是不具有时序性的，因为计算的时候不考虑涨跌时候的先后顺序。</li></ul><ul class="notion-list notion-list-disc notion-block-326617162ba7496283c1fa007ab9dc01"><li>在大涨大跌的行情下短时间的 RSI 指标可能不准，这时候可以通过更换计算周期更大的图表，例如把1小时级别的 RSI 换成天级别的 RSI。</li></ul><ul class="notion-list notion-list-disc notion-block-756d6db358f14c73a36ea81d098a7572"><li>RSI 钝化指一段时间内的趋势过于一致导致 RSI 指标出现极端的情况，这时候可能难以反映强弱情况。</li></ul><ul class="notion-list notion-list-disc notion-block-5968bfdd19804bd281b2b4502d891f0a"><li>RSI 背离指的是 RSI 指标的走势和股价的走势不一致，分为看涨背离和看跌背离：</li><ul class="notion-list notion-list-disc notion-block-5968bfdd19804bd281b2b4502d891f0a"><li>看涨背离：股价创新低，但是 RSI 指数向上走，意味着买盘越来越强，这时候可以考虑入场做多。</li><li>看涨背离：股价创新高，但是 RSI 指数向下走，意味着卖盘越来越强，这时候可以考虑入场做空。</li><ul class="notion-list notion-list-disc notion-block-795ba82dcefd4e16bb065e6577e2e2f3"><figure class="notion-asset-wrapper notion-asset-wrapper-image notion-block-b8c1a7fe46ae4bda80fedbedfbeb0e6c"><div style="position:relative;display:flex;justify-content:center;align-self:center;width:100%;max-width:100%;flex-direction:column;height:100%"><img style="object-fit:cover" src="https://www.notion.so/image/https%3A%2F%2Fs3.us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F9a73a0a0-4ff4-4c01-9fba-bdee1f90c872%2FUntitled.png%3FX-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Content-Sha256%3DUNSIGNED-PAYLOAD%26X-Amz-Credential%3DAKIAT73L2G45EIPT3X45%252F20230307%252Fus-west-2%252Fs3%252Faws4_request%26X-Amz-Date%3D20230307T142400Z%26X-Amz-Expires%3D86400%26X-Amz-Signature%3D39c3314ced764ea1736adae2ba8a5bbc297e8a197dd2cef6d68ff5224b0b41a9%26X-Amz-SignedHeaders%3Dhost%26x-id%3DGetObject?table=block&amp;id=b8c1a7fe-46ae-4bda-80fe-dbedfbeb0e6c" alt="notion image" loading="lazy" decoding="async"/></div></figure><div class="notion-text notion-block-31fb9b9e1e0d4f85a0f75238638a6635">（看跌背离示意图）</div><div class="notion-blank notion-block-d8ff620bfdb849c5a737cc1fa4eda7d7"> </div></ul></ul></ul></main></div>]]></content:encoded>
        </item>
    </channel>
</rss>