conda-env-clone-install-create-error-with-mirror-custom-channel

[TOC]

问题原因

突然不能克隆base环境了

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
rm -rf pyspark3602; conda create --name pyspark3602 --clone base
Source:      /home/hdp_lbg_ectech/wangke/app/anaconda3
Destination: /home/hdp_lbg_ectech/wangke/app/anaconda3/envs/pyspark3602
The following packages cannot be cloned out of the root environment:
 - http://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/linux-64::conda-4.9.2-py36h5fab9bb_0
 - defaults/linux-64::conda-build-3.10.5-py36_0
Packages: 195
Files: 64657

...

CondaHTTPError: HTTP 404 NOT FOUND for url <http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/noarch/keras-applications-1.0.8-py_1.tar.bz2>
Elapsed: 00:00.351600

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.

第一个问题

这里的主要问题是使用了清华的镜像. 清华镜像只有文件.conda后缀, 没有.tar.bz2后缀的文件,所以报CondaHTTPError: HTTP 404 NOT FOUND for url错误导致不能创建环境

save DataFrame as partition of hive table

使用DataFrame直接创建hive表, 并作为其中的一个分区数据

test 1

1
2
3
4
5
6
table
    .write
    .format("hive")
    .mode("overwrite")
    .option("path", inputPath + "_table")
    .insertInto(tableName)

error, 需要先创建表

1
Exception in thread "main" org.apache.spark.sql.AnalysisException: Table not found: hdp_lbg_ectech_ads.zp_compensate_ad_detail_test1;

test 2

首先判别表是否存在

pyspark udf udaf with jar

问题描述

使用scala开发了udaf, 在scala程序中能使用, 无法在pyspark中使用

使用udaf有两种方法:

第一种是hive使用

1
2
3
4
ss.sql("CREATE TEMPORARY FUNCTION MostFreq22 AS 'com.company.strategy.rank.bussiness.util.udf.MostFreqUDAF' USING JAR '%s'" % jar_path)
# SparkConf未指定spark.jars
# error
# pyspark.sql.utils.AnalysisException: Can not load class 'com.company.strategy.rank.bussiness.util.udf.MostFreqUDAF' when registering the function 'MostFreq22', please make sure it is on the classpath

当指定了spark.jars, 仍然报错

dropout笔记

原理

dropout原理, 随机丢弃一些(输入)神经元, 防止参数过拟合

Applies Dropout to the input.

Dropout consists in randomly setting a fraction rate of input units to 0 at each update during training time, which helps prevent overfitting. The units that are kept are scaled by 1 / (1 - rate), so that their sum is unchanged at training time and inference time.

多git协作

多git有两种状态

  • 多个git账号(user, email)
  • 多个认证(identities)

设置多个git账号

前提: git版本号(git --version)>=2.13

vim ~/.gitconfig

注意: 路径gitdir:后面要加斜杠/

save-win10-spotlight

保存win10 spotlight 壁纸到D盘下!

assets/save-win10-spotlight/1546063951450.png

1 将下面文本保存为win10_spotlight.bat(直接下载)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
rem https://blog.csdn.net/qq_34260368/article/details/78364055
rem https://blog.csdn.net/linbounconstraint/article/details/80191846
rem https://blog.csdn.net/Anymake_ren/article/details/51125609
rem https://stackoverflow.com/questions/17587347/batch-file-to-run-xcopy-without-overwriting-existing-files

@echo off
MD wallpaper
xcopy "%UserProfile%\AppData\Local\Packages\Microsoft.Windows.ContentDeliveryManager_cw5n1h2txyewy\LocalState\Assets" "D:\spotlight\" /S /Y /D

D:
cd "D:\spotlight\"
ren * *.jpg
pause

2 双击执行, 执行后前往D:\spotlight\查看