NCTF's site

Created2026-04-29|Updated2026-04-29|algorithm•DP

给你一个字符串 s，找到 s 中最长的回文子串。 link 动态规划:注意要不要连续!!! 对布尔变量的动态规划:P表示从i到j的连续子串是回文 P(i,j)=P(i+1,j−1)∧(s[i]==s[j])p(i,i)=truep(i,i+1)=(s[i]==s[i+1])\begin{aligned} P(i,j)&=P(i+1,j-1)\wedge (s[i]==s[j])\\ p(i,i)&=true\\ p(i,i+1)&=(s[i]==s[i+1]) \end{aligned} P(i,j)p(i,i)p(i,i+1)=P(i+1,j−1)∧(s[i]==s[j])=true=(s[i]==s[i+1]) #include <iostream>#include <string>#include <vector>using namespace std;class Solution {public: string longestPalindrome(string s) { ...

codingnote_devide

Created2026-04-27|Updated2026-04-29|algorithm•divide and conquer

题目:leetcode 给定两个大小分别为 m 和 n 的正序（从小到大）数组 nums1 和 nums2。请你找出并返回这两个正序数组的中位数。算法的时间复杂度应该为 O(log (m+n)) 。求中位数实际上可以归结为分奇偶性的求第k小 class Solution {public: int getKthElement(const vector<int>& nums1, const vector<int>& nums2, int k) { /* 主要思路：要找到第 k (k>1) 小的元素，那么就取 pivot1 = nums1[k/2-1] 和 pivot2 = nums2[k/2-1] 进行比较 * 这里的 "/" 表示整除 * nums1 中小于等于 pivot1 的元素有 nums1[0 .. k/2-2] 共计 k/2-1 个 * nums2 中小于等于 pivot2 的元素有 nums2[0 .. k/2...

cuda note

Created2026-04-21|Updated2026-04-25|reading note|parallel

参考自英伟达官方教程 link lambda note from zhihu zhihu CUDA Made Easy execution space execution space:where the code execute partitioned into host(cpu) and device(typically gpu) by default it runs on host,you need to specify which code to run on device #include <thrust/execution_policy.h>#include <thrust/universal_vector.h>#include <thrust/transform.h>#include <cstdio>int main() { float k = 0.5; float ambient_temp = 20; thrust::universal_vector<float> temp&...

cs149_3

Created2026-04-18|Updated2026-04-26|course note|parallel

GPU Architecture & CUDA Programming history origin targets:rendering 3D image 使用三角形描述,给定图片各点的材料,camera,计算投影的样子 early scientiffic computation: 指定区域,fragment shader function 换成实际要执行的函数 2007之前:gpu只有计算graphics pipeline 的interface 2007之后的架构: Let’s say a user wants to run a non-graphics program on the GPU’s programmable cores… Application can allocate buffers in GPU memory and copy data to/from buffers Application (via graphics driver) provides GPU a single kernel program binary Application te...

cs149 part 2

Created2026-04-09|Updated2026-04-21|course note|parallel

Parallelizing Code More in ISPC semantics 使用foreach:实际上是指定了任务(gang)总共迭代这么多次,编译器负责具体方案当我们试图在多个程序实例中对同一个变量进行操作(race condition) 返回:应该是uniform的,返回多个程序实例的变量实际上没什么意义 e.g:计算数组之和正确做法是在每个程序实例引入一个partial 局部变量,任务完成后将所有程序实例的partial合起来 (无论其实现逻辑是SIMD还是多线程都会产生竞争) ispc gang is actually implemented by SIMD 写的时候像是SPMD(多个逻辑线程),但是优化的时候要考虑它实际上是SIMD 通过programindex,programcount等底层暴露才能在需要的时候优化性能 task:实际上是一个线程池(固定八个线程取任务来做) Case study on thought process of writing and optimizing a parallel program 区分:硬件线程和操作系统线...

numpy and pandas

Created2026-04-01|Updated2026-04-02|data analysis|small piece

numpy NumPy对于轴的编号由外向内，从行到列 In [4]: a.reshape(2,5,2)Out[4]:array([[[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9]], [[10, 11], [12, 13], [14, 15], [16, 17], [18, 19]]])In [6]: a.reshape(1,2,5,2)Out[6]:array([[[[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9]], [[10, 11], [12, 13], [14, 15], [16, 17], [18, 19]]]]) 广播的条件：维度数相同（如都是三维），某array的一个或多个维度的长度是1/维度数不同，添加长...

cs149 part one

Created2026-03-29|Updated2026-04-21|course note|parallel

Why Parallelism? Why Efficiency? observation: communication limits the maximun speedup of parallel imbalance of work load limits the speedup communication can be the domain limits 历史上cpu性能提升极快,并行没那么有用近年来,并行计算和提高时钟周期极大提升性能计算机视角:程序只是一段指令序列 superscalar processor execution processor自动去找彼此独立的指令来做并行 instruction level parallelism(ILP) 通常到4就上不去了(不专门设计的程序的并行性有限) 很多地方会使用专用处理器(NPU,TPU,GPU) memory:just an array of bytes(dram这些是具体的implementation) A Modern Multi-Core Processor (Part I) 示例:利用泰勒...

无监督域自适应模型压缩的迭代知识提取和剪枝

Created2026-03-16|Updated2026-03-16|pruning|paper reading

文献阅读记录地址:Iterative knowledge distillation and pruning for model compression in unsupervised domain adaptation abstract 深度学习实际应用中的两个问题:训练数据与测试数据分布不一致,不充足的标签数据无监督域适应学习(unsupervised domain adaptaion,UDA),是解决这一问题的重要技术现有UDA难以满足实时推理和资源受限场景的要求,模型压缩则会降低表现提出方法ITMC(iterative transfer model compression):交替执行TKD(transfer knowledge distillation),和ACP(adaptive channel pruning) 论文贡献: 提出ITMC,迭代式进行蒸馏和剪枝 TKD和ACP,动态剪枝. related work UDA(Unsupervised Domain Adaptation) UDA是用来解决训练集和测试集的数据分布迁移导致的性能下降的,通常我们的...

math_for_ML

Created2026-03-04|Updated2026-04-28|reading note|math

东北大学《人工智能的数学基础》,笔者自学参考教材为人工智能的数学基础(冯朝路等) 特征向量与矩阵分析有右逆行满秩,左逆列满秩非方阵:n×k,若n>k,其n维体积为0,k维体积不为0 MDS 保持变换前后空间内距离保持不变代码参考:https://0809zheng.github.io/2021/07/28/mds.html def MDS(data, k): d, N = data.shape D = np.zeros([N, N]) # 计算距离矩阵D for i in range(N): for j in range(N): D[i, j] = np.sqrt(np.sum((data[:,i]-data[:,j])**2)) # 计算内积矩阵B T1 = np.sum(D**2, axis=0, keepdims=True)/N T2 = np.sum(D**2, axis=1, keepdims=True)/N T3 = np.sum(D**2, keepdims=Tru...

mml

Created2026-03-04|Updated2026-04-26|reading note|math

东北大学《人工智能的数学基础》,笔者自学参考教材为 1.mathematics for machine learning 参考教材链接如下 mml-book linear algebra(from book) 这部分参考MIT 线代和 3b1b笔记食用 general linear group(一般线性群):可逆矩阵关于矩阵乘法构成一般线性群,很明显这不是阿贝尔群向量子空间:对于一个线性空间V,若 U∈VU\in VU∈V 且线性空间操作限制结果在U内,则U为一个向量子空间生成集(generating set):对一个线性空间V，其所有向量都能由A中向量线性组合表示则A为V的生成集(基是最小的生成集) AB=CCij=∑kaikbkjCkT=ABkTCi=AiB\begin{aligned} AB=C\\ C_{ij}=\sum_k a_{ik}b_{kj}\\ C^T_k=AB_k^T\\ C_i=A_iB \end{aligned} AB=CCij=k∑aikbkjCkT=ABkTCi=AiB 基变换的神奇理解: 向量在某基向量下的坐标右乘上基向量...