feature

python - 使用来自 sklearn.feature_extraction.text.TfidfVectorizer 的 TfidfVectorizer 计算 IDF

我认为函数TfidfVectorizer没有正确计算IDF因子。例如，从tf-idffeatureweightsusingsklearn.feature_extraction.text.TfidfVectorizer复制代码:fromsklearn.feature_extraction.textimportTfidfVectorizercorpus=["Thisisverystrange","Thisisverynice"]vectorizer=TfidfVectorizer(use_idf=True,#utilizaoidfcomopeso,fazendotf*idfnorm=Non

python - sklearn随机森林索引feature_importances_如何做

我在sklearn中使用了RandomForestClassifier来确定数据集中的重要特征。我如何能够返回实际的特征名称(我的变量标记为x1、x2、x3等)而不是它们的相对名称(它告诉我重要的特征是“12”、“22”等)。以下是我目前用于返回重要功能的代码。important_features=[]forx,iinenumerate(rf.feature_importances_):ifi>np.average(rf.feature_importances_):important_features.append(str(x))printimportant_features此外，为了

feature_importances importances code pre important python scikit-learn random-forest feature-selection

python - TensorFlow 内部错误 : Unable to get element as bytes

我正在尝试使用TensorFlow对一些包含分类和数字数据混合的日志数据运行DNNClassifier。我已经创建了特征列来指定和存储/散列tensorflow的数据。当我运行代码时，我收到“无法将元素作为字节获取”内部错误。注意:我不想删除此article中所述的Nan值所以我使用此代码将它们转换为0train=train.fillna(0,axis=0)所以我不确定为什么我仍然收到此错误。如果我删除Nan，那么它会起作用，但我不想删除Nan，因为我觉得模型需要它们进行训练。defcreate_train_input_fn():returntf.estimator.inputs.pa

TensorFlow element column feature feature_column python jupyter-notebook

python - scikit 学习 : desired amount of Best Features (k) not selected

我正在尝试使用卡方(scikit-learn0.10)选择最佳特征。从总共80个训练文档中，我首先提取了227个特征，并从这227个特征中选择前10个特征。my_vectorizer=CountVectorizer(analyzer=MyAnalyzer())X_train=my_vectorizer.fit_transform(train_data)X_test=my_vectorizer.transform(test_data)Y_train=np.array(train_labels)Y_test=np.array(test_labels)X_train=np.clip(X_tr

Features selected True code False python machine-learning scikit-learn chi-squared

python - 在 pypi 上注册包时为 "Server response (401): You must login to access this feature"

我正在尝试在pyPI上注册一个包。在创建一个看起来像的.pypirc之后[distutils]#thistellsdistutilswhatpackageindexesyoucanpushtoindex-servers=pypipypitest[pypi]repository:https://pypi.python.org/pypiusername:"amfarrell"password:"Idontpostmypassphrasepublicly"[pypitest]repository:https://testpypi.python.org/pypiusername:"amfarr

amp response section pypi python setuptools distutils

python - SkLearn 多项式 NB : Most Informative Features

由于我的分类器在测试数据上产生了大约99%的准确率，我有点怀疑并想深入了解我的NB分类器最有用的特征，看看它正在学习什么样的特征。以下主题非常有用:Howtogetmostinformativefeaturesforscikit-learnclassifiers?至于我的特征输入，我仍在尝试，目前我正在使用CountVectorizer测试一个简单的unigram模型:vectorizer=CountVectorizer(ngram_range=(1,1),min_df=2,stop_words='english')关于上述主题，我发现了以下函数:defshow_most_inform

Informative Features 16.2420 2420 section python machine-learning scikit-learn classification text-classification

python - 投票分类器 : Different Feature Sets

我有两个不同的特征集(因此，行数相同且标签相同)，在我的例子中DataFrames:df1:|A|B|C|-------------|1|4|2||1|4|8||2|1|1||2|3|0||3|2|5|df2:|E|F|---------|6|1||1|3||8|1||2|8||5|2|标签:|labels|----------|5||5||1||7||3|我想用它们来训练VotingClassifier。但是拟合步骤只允许指定单个特征集。目标是使clf1与df1和clf2与df2相匹配。eclf=VotingClassifier(estimators=[('df1-clf',clf1

Different Feature code pre estimators python machine-learning scikit-learn

python - 快速信息增益计算

我需要为文本分类的>10k文档中的>100k特征计算信息增益分数。下面的代码工作正常，但完整数据集的速度非常慢-在笔记本电脑上需要一个多小时。数据集是20newsgroup，我正在使用scikit-learn，chi2scikit中提供的功能运行速度非常快。知道如何更快地计算此类数据集的信息增益吗？definformation_gain(x,y):def_entropy(values):counts=np.bincount(values)probs=counts[np.nonzero(counts)]/float(len(values))return-np.sum(probs*np.l

python 快速 feature entropy set performance machine-learning scikit-learn feature-selection

python - 如何在 sklearn 管道中获取通过特征消除选择的特征名称？

我在我的sklearn管道中使用递归特征消除，管道看起来像这样:fromsklearn.pipelineimportFeatureUnion,Pipelinefromsklearnimportfeature_selectionfromsklearn.feature_extraction.textimportTfidfVectorizerfromsklearn.svmimportLinearSVCX=['Iamasentence','anexample']Y=[1,2]X_dev=['anothersentence']#classifierLinearSVC1=LinearSVC(tol

何在 sklearn 39 feature pipeline python machine-learning scikit-learn

python - 值错误 : Feature not in features dictionary

我正在尝试使用TensorFlow编写一个简单的深度机器学习模型。我正在使用我在Excel中制作的玩具数据集，只是为了让模型工作并接受数据。我的代码如下:importpandasaspdimportnumpyasnpimporttensorflowastfraw_data=np.genfromtxt('ai/mock-data.csv',delimiter=',',dtype=str)my_data=np.delete(raw_data,(0),axis=0)#deletesthefirstrow,axis=0indicatesrow,axis=1indicatescolumnmy_d

dictionary features 39 column code python numpy tensorflow

13 14 151617 18 19