MongoDB python scrapy保存到元数据库 navicat

由墨香-15607781945 · 发布日期 2022年10月15日 · 已更新 2022年10月29日

资源地址

MongoDB6 https://fastdl.mongodb.org/windows/mongodb-windows-x86_64-6.0.2-signed.msi

navicat http://dl.kxdw.com/pc/Navicat16.rar?key=2cef9b482853bbbf7c7f7d69449f1745&uskey=a02ecc6de4b6c497c601ea0289a34ef590f212eb

MongoDB compass 查询工具

        # 获取该条件存在的记录总数 且不为空值
        # ctall = tb.count_documents(filter={'title': {'$regex': '.*'}})
        # 获取记录总数
        ctall = tb.estimated_document_count()

查询方式对比
        find_it = tb.find(
            sort=[('_id', -1)],
            # 单一查询
            # filter={'title': {'$regex': find_kw}},
            # filter={'$and':[{'img':{'$exists':True}}]},
            # 多条件查询
            # filter={'$or': [{'title': {'$regex': find_kw, '$options': 'imsx'}},
            #                 {'keywords': {'$regex': find_kw, '$options': 'imsx'}},
            #                 {'description': {'$regex': find_kw, '$options': 'imsx'}}]},
            # 和与或 多条件查询
            # filter={'$and': [{"img": {'$exists': False}, },
            #                  {'$or': [{'title': {'$regex': find_kw, '$options': 'imsx'}},
            #                           {'keywords': {'$regex': find_kw, '$options': 'imsx'}},
            #                           {'description': {'$regex': find_kw, '$options': 'imsx'}}]},
            #                  ]},
            # 和与或 多重多条件查询
            filter={'$and': [{'$or': [{"img": {'$exists': False}},
                                      {"img": "null"}, {"img": "None"}, {"img": None}, {"img": ""}]},
                             {'$or': [{'title': {'$regex': find_kw, '$options': 'imsx'}},
                                      {'keywords': {'$regex': find_kw, '$options': 'imsx'}},
                                      {'description': {'$regex': find_kw, '$options': 'imsx'}}]},
                             ]},
            limit=results_per_page,
            skip=skips,
            # skip={}
        )

模糊搜索查找
{title: {$regex:/00|jk/}} 

忽略转义大小写等模糊查找 {title: {$regex:/hd/,$options:"imsx"}}
{$or:[{title: {$regex:/hd/,$options:"imsx"}} ,{description: {$regex:/jk/,$options:"imsx"}} ]}

多条件or或者查找
{ $or : [{"title" : /.*波多.*/i}, {"description" : /.*jk.*/i}] }

时间范围内
({"START_TIME":{"$gte":ISODate("2021-08-03 07:59:06"),"$lte":ISODate("2021-09-01 08:30:46")}})

复合条件查找 
关系数据库：select * from where（state1=11 and state2=22） or value >300
MongoDB数据库：db.getCollection('testOrAnd'). find(
{$or:[{$and:[{"state1":11},{"state2":22}]},{"value":{$gte:300}} ]  }
)

准确查询
db.user.find({$or:[{name:{$eq:'小博'}},{name:{$eq:'测试小博'}}]})

删除某个字段所有的内容 python MongoDB
    delete_kw = 'play_src'
    delete = tb.update_many(
        filter={delete_kw: {"$exists": True}},
        update={'$unset': {delete_kw: None}},
        upsert=False,

    )
    # print(delete)

查找某个字段的内容 python MongoDB
    find_kw = 'domain'
    find_it = tb.find(
        # filter = {'domain': {'$regex': ".*"}}
    )
    domains = []
    for i in find_it:
        # print(i['domain'])
        domains.append(i['domain'])
    domains = list(set(domains))
    print(domains)


mongodb 随机查询
model.aggregate([{ $sample: { size: 1 } }]);
// size指定返回数据条数

$exists:查询是否存在某个字段
因为mongodb是非关系型数据库，因此，每条记录可能包含的字段都不一样，不同的数据之间可能存在一些字段没有写入值，想要筛选某个字段是否存在的时候，就可以使用$exists去进行筛选。
存在
db.getCollection("user").find({age:{$exists:1}})
db.getCollection("user").find({age:{$exists:true}})
不存在
db.getCollection("user").find({age:{$exists:0}})
db.getCollection("user").find({age:{$exists:false}})

参考链接：https://huaweicloud.csdn.net/63356eadd3efff3090b56b38.html

$regex操作符的介绍
MongoDB使用$regex操作符来设置匹配字符串的正则表达式，使用 PCRE(Pert Compatible Regular Expression)作为正则表达式语言。
regex操作符
{<field>:{$regex:/pattern/，$options:’<options>’}}
{<field>:{$regex:’pattern’，$options:’<options>’}}
{<field>:{$regex:/pattern/<options>}}
正则表达式对象
{<field>: /pattern/<options>}
$regex与正则表达式对象的区别:
在$in操作符中只能使用正则表达式对象，例如:{name:{$in:[/^joe/i,/^jack/}}
在使用隐式的$and操作符中，只能使用$regex，例如:{name:{$regex:/^jo/i, $nin:['john']}}
当option选项中包含X或S选项时，只能使用$regex，例如:{name:{$regex:/m.*line/,$options:"si"}}

$regex操作符的使用
$regex操作符中的option选项可以改变正则匹配的默认行为，它包括 i, m, x以及S四个选项，其含义如下
i 忽略大小写，{<field>{$regex/pattern/i}}，设置i选项后，模式中的字母会进行大小写不敏感匹配。
m 多行匹配模式，{<field>{$regex/pattern/,$options:'m'}，m选项会更改^和$元字符的默认行为，分别使用与行的开头和结尾匹配，而不是与输入字符串的开头和结尾匹配。
x 忽略非转义的空白字符，{<field>:{$regex:/pattern/,$options:'m'}，设置x选项后，正则表达式中的非转义的空白字符将被忽略，同时井号(#)被解释为注释的开头注，只能显式位于option选项中。
s 单行匹配模式{<field>:{$regex:/pattern/,$options:'s'}，设置s选项后，会改变模式中的点号(.)元字符的默认行为，它会匹配所有字符，包括换行符(\n)，只能显式位于option选项中。
使用$regex操作符时，需要注意下面几个问题:
i，m，x，s可以组合使用，例如:{name:{$regex:/j*k/,$options:"si"}}
在设置索弓}的字段上进行正则匹配可以提高查询速度，而且当正则表达式使用的是前缀表达式时，查询速度会进一步提高，例如:{name:{$regex: /^joe/}

使用mongodb时可以使用compass可视化管理工具来直观的查看文件数据

可以通过documents下箭头指向的三个框去指定查询条件查询。

其中：

filter：类似sql命令中where后面的条件，注意格式要写成键值对形式，键指字段值。

project：好像是是管道符，没有用到，留作以后研究，可以参考http://www.cnblogs.com/ljhdo/p/5019837.html该文章。

sort：顾名思义指排序方式，输入的也是键值对形式，键指要排序的字段，值可以写1或-1，规定是升序(1)还是降序(-1)

MongoDB 设置唯一索引

通过Indexs下面
create index 
index field 选着域
select a type 选择asc
Options 打钩 Create Unique index （唯一，不重复）

python Scrapy保存到MongoDB配置

# python -m pip install pymongo

# settings.py 文件配置

ITEM_PIPELINES = {
   'getAllM3u8.pipelines.MongoDBPipeline': 300,
}

# pipelines.py 文件配置

import json
import pymongo
class MongoDBPipeline(object):
    DB_URL = 'mongodb://localhost:27017/'  # 直接将DB_URI，DB_NAME 写下具体的内容，随后在setting中配置
    DB_NAME = 'm3u8db'
    def __init__(self):
        # 连接数据库
        # self.client = pymongo.MongoClient(host='localhost', port=27017)
        self.client = pymongo.MongoClient(self.DB_URL)
        # 创建库
        self.db = self.client['m3u8db']
        # 创建表
        self.table = self.db['m3u8tb']

    def process_item(self, item, spider):
        # print('___________________++++++', dict(item))
        self.table.insert_one(dict(item))
        # self.table.insert_one()
        return item
    def close_spider(self, spider):
        self.client.close()

MongoDB python scrapy保存到元数据库 navicat

资源地址

MongoDB compass 查询工具

使用mongodb时可以使用compass可视化管理工具来直观的查看文件数据

MongoDB 设置唯一索引

python Scrapy保存到MongoDB配置

您可能还喜欢...

发表回复取消回复

近期文章

近期评论

归档

分类

MongoDB python scrapy保存到元数据库 navicat

资源地址

MongoDB compass 查询工具

使用mongodb时可以使用compass可视化管理工具来直观的查看文件数据

MongoDB 设置唯一索引

python Scrapy保存到MongoDB配置

您可能还喜欢...

向mysql表中插入空值

php mysql使用like模糊搜索json的数据， php后端连接MySQL数据库并返回json结果， Php如何返回json数据， PHP+实现多个关键词搜索查询功能

python数据写入csv、csv转excel、用Pandas把数据写入excel简单总结

发表回复 取消回复

近期文章

近期评论

归档

分类

发表回复取消回复