成长思维200

2025-11-07 Unsort Unsort

[TOC]

振作起来, 迈出第一步!

学习力与不断成长

费曼学习法

明确目标: 为什么学, 学什么? 怎么学? WhyWhatHow

学习大模型 -> Why?提升工作能力 -> What?大模型相关技术 -> How?书籍

提升社交能力 -> Why? 自己社交薄弱 -> What? 实践沟通能力, 建立自信 -> How?积极参加活动, 积极认识朋友

conda-env-clone-install-create-error-with-mirror-custom-channel

2024-08-30 Unsort Unsort

[TOC]

问题原因

突然不能克隆base环境了

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


rm -rf pyspark3602; conda create --name pyspark3602 --clone base
Source:      /home/hdp_lbg_ectech/wangke/app/anaconda3
Destination: /home/hdp_lbg_ectech/wangke/app/anaconda3/envs/pyspark3602
The following packages cannot be cloned out of the root environment:
 - http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/linux-64::conda-4.9.2-py36h5fab9bb_0
 - defaults/linux-64::conda-build-3.10.5-py36_0
Packages: 195
Files: 64657

...

CondaHTTPError: HTTP 404 NOT FOUND for url <http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/noarch/keras-applications-1.0.8-py_1.tar.bz2>
Elapsed: 00:00.351600

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

第一个问题

这里的主要问题是使用了清华的镜像. 清华镜像只有文件.conda后缀, 没有.tar.bz2后缀的文件,所以报CondaHTTPError: HTTP 404 NOT FOUND for url错误导致不能创建环境

save DataFrame as partition of hive table

2023-04-27 Data Data

使用DataFrame直接创建hive表, 并作为其中的一个分区数据

test 1

1
2
3
4
5
6


table
    .write
    .format("hive")
    .mode("overwrite")
    .option("path", inputPath + "_table")
    .insertInto(tableName)

error, 需要先创建表

1

Exception in thread "main" org.apache.spark.sql.AnalysisException: Table not found: hdp_lbg_ectech_ads.zp_compensate_ad_detail_test1;

test 2

首先判别表是否存在

ssh git github permission denied problem

2022-11-03 Unsort Git Github Ssh

问题描述

原来在/etc/hosts配置了github的ip解析, 突然有一天push很慢, 甚至经常timeout. 然后自己把hosts中关于github的映射都删除了, 但是链接不上了github了, 自己有用飞机软件.

pyspark udf udaf with jar

2022-10-21 Unsort Unsort

问题描述

使用scala开发了udaf, 在scala程序中能使用, 无法在pyspark中使用

使用udaf有两种方法:

第一种是hive使用

1
2
3
4


ss.sql("CREATE TEMPORARY FUNCTION MostFreq22 AS 'com.company.strategy.rank.bussiness.util.udf.MostFreqUDAF' USING JAR '%s'" % jar_path)
# SparkConf未指定spark.jars
# error
# pyspark.sql.utils.AnalysisException: Can not load class 'com.company.strategy.rank.bussiness.util.udf.MostFreqUDAF' when registering the function 'MostFreq22', please make sure it is on the classpath

当指定了spark.jars, 仍然报错

dropout笔记

2021-04-27 Tf Dl Tf Dl

原理

dropout原理, 随机丢弃一些(输入)神经元, 防止参数过拟合

Applies Dropout to the input.

Dropout consists in randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent overfitting. The units that are kept are scaled by 1 / (1 - rate), so that their sum is unchanged at training time and inference time.

hello

2021-04-04 Unsort Unsort

相信未来, 拥抱未来!

多git协作

2021-04-03 Unsort Unsort

多git有两种状态

多个git账号(user, email)
多个认证(identities)

设置多个git账号

前提: git版本号(git --version)>=2.13

vim ~/.gitconfig

注意: 路径gitdir:后面要加斜杠/

proxy

2019-06-10 Unsort Unsort

2025-09-21 update

clash X可能不在维护, macos 推荐clash-party

原文

用了一个很久的(从18年到现在(25年), 稳定, 简单), 名字叫MEWU, 强烈推荐

AutoHotKey

2019-05-27 Unsort Unsort

1 下载AutoHotKey

官网, 点我直接官网下载

1558966352586

2 编辑脚本

另存下面脚本为capslock_plus.ahk

注: 下面脚本只适用于AutoHotKey1.XXX, 不使用于2.XXX(V2和其他版本有区别!)直接下载

五月阅读

2019-05-12 Reading Reading

2019-5-12

最近计划学习一下深度学习框架, kaggle是个不错的平台, 就找了其中的比赛Jigsaw Unintended Bias in Toxicity Classification. 在比赛的第四段, 描述了比赛的背景, 和技术中存在的问题:

mecab

2019-05-07 Unsort Unsort

wondows平台, pip安装MeCab:

1

pip install mecab-python3

出现问题:

1

 'mecab-config' 不是内部或外部命令，也不是可运行的程序或批处理文件。

1552982337981

在网上找了一些资料, 一些日文资料写的云里雾里的, 比如这篇Windows環境でのMeCab(Python)のインストール(没有必要打开).

成长思维200

学习力与不断成长

费曼学习法

conda-env-clone-install-create-error-with-mirror-custom-channel

问题原因

第一个问题

save DataFrame as partition of hive table

test 1

test 2

ssh git github permission denied problem

问题描述

pyspark udf udaf with jar

问题描述

dropout笔记

原理

hello

多git协作

设置多个git账号

proxy

2025-09-21 update

原文

AutoHotKey

1 下载AutoHotKey

2 编辑脚本

五月阅读

2019-5-12

mecab

最近文章

特栏