Giter Site home page Giter Site logo

Comments (14)

milinddethe15 avatar milinddethe15 commented on May 21, 2024 1

Hi, I want to work on this. Assign it to me.

from awesome-hacker-search-engines.

edoardottt avatar edoardottt commented on May 21, 2024 1

Done @milinddethe15, let me know if you have some doubts or you need guidance.

from awesome-hacker-search-engines.

milinddethe15 avatar milinddethe15 commented on May 21, 2024 1

Hi @edoardottt, There are already some of the duplicate links in README.md.

[ ERR ] DUPLICATE FOUND!
- [C99.nl](https://api.c99.nl/)
- [HackerTarget](https://hackertarget.com/ip-tools/)
- [IntelligenceX](https://intelx.io/)
- [PhoneBook](https://phonebook.cz/)
- [Rapid7 - DB](https://www.rapid7.com/db/)
- [RocketReach](https://rocketreach.co/)
- [SynapsInt](https://synapsint.com/)
- [Vulmon](https://vulmon.com/)
- [wannabe1337.xyz](https://wannabe1337.xyz/)

We need to fix this before running script in workflow.

from awesome-hacker-search-engines.

milinddethe15 avatar milinddethe15 commented on May 21, 2024 1

Sorry @edoardottt, bugmenot is not a duplicate. I created duplicate of bugmenot to run script and forgot to discard it.

from awesome-hacker-search-engines.

milinddethe15 avatar milinddethe15 commented on May 21, 2024 1

Thank you @edoardottt ! I learned a lot about bash scripting and github actions in this issue.

from awesome-hacker-search-engines.

milinddethe15 avatar milinddethe15 commented on May 21, 2024

Hi @edoardottt, what do you mean by 'devel branches'?

from awesome-hacker-search-engines.

edoardottt avatar edoardottt commented on May 21, 2024

Hi @edoardottt, what do you mean by 'devel branches'?

Sorry I'm working on multiple repos. There's only the main branch here. Sorry for the mistake

from awesome-hacker-search-engines.

edoardottt avatar edoardottt commented on May 21, 2024

Thanks @milinddethe15

1. How did you run the script?

If I run the script locally this is what I get:

$> ./scripts/check-dups.sh 
[ OK! ] NO DUPLICATES FOUND.
350 links in README.

2. Clearly those are duplicate entries, but the fact is that they are okay... in the sense that they provide multiple services and so it's okay to have a single service providing e.g. dns and domain results.

As example:

cat README.md | grep Vulmon
- [Vulmon](https://vulmon.com/) - Vulnerability and exploit search engine
- [Vulmon](https://vulmon.com/) - Vulnerability and exploit search engine

There are two entries, but in different categories (one is vulnerability and the other is exploit)

3. The best solution would be to check if there are duplicates in each category. In that case the duplicated entry is an error.

from awesome-hacker-search-engines.

milinddethe15 avatar milinddethe15 commented on May 21, 2024

Updated script:

#!/bin/bash

readme="README.md"

pwd=$(pwd)

if [[ "${pwd: -7}" == "scripts" ]];
then
    readme="../README.md"    
fi

# Function to extract links from a section and check for duplicates
check_section() {
    section=$1
    section_content=$(awk -v section="$section" '/^### / {p=0} {if(p)print} /^### '"$section"'/ {p=1}' "$readme")
    duplicate_links=$(echo "$section_content" | grep -oP '\[.*?\]\(\K[^)]+' | sort | uniq -d)

    if [[ -n $duplicate_links ]]; then
        echo "[ ERR ] DUPLICATE LINKS FOUND IN SECTION: $section"
        echo "$duplicate_links"
    else
        echo "[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: $section"
    fi
}

# Get all unique section headings from the README file and handle spaces and slashes
sections=$(grep '^### ' "$readme" | sed 's/^### //' | sed 's/[\/&]/\\&/g')

# Call the function for each section
for section in $sections; do
    check_section "$section"
done
$ ./scripts/check-dups.sh 
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: General
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Search
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Engines
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Servers
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Vulnerabilities
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Exploits
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Attack
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Surface
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Code
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Mail
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Addresses
[ ERR ] DUPLICATE LINKS FOUND IN SECTION: Domains
https://spyonweb.com/
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: URLs
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: DNS
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Certificates
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: WiFi
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Networks
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Device
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Information
[ ERR ] DUPLICATE LINKS FOUND IN SECTION: Credentials
https://bugmenot.com/
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Leaks
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Hidden
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Services
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Social
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Networks
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Phone
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Numbers
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Images
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Threat
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Intelligence
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Web
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: History
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Surveillance
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: cameras
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Unclassified
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Not
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: working
awk: warning: escape sequence `\/' treated as plain `/'
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: \/
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Paused

There are duplicate links in some category. I will fix them.
Should I finalise this updated script?

from awesome-hacker-search-engines.

edoardottt avatar edoardottt commented on May 21, 2024

Amazing! Yes, you can create a new issue for deleting duplicates and open a PR removing them.

  • For the spyonweb we can delete the first one, while (I may be wrong on this) bugmenot is not a duplicate... am I wrong? In the credentials section there is only one entry for bugmenot.

We should also fix this part:

[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Not
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: working
awk: warning: escape sequence `\/' treated as plain `/'
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: \/
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Paused

This should be treated as a single category Not Working / Paused

Also this:

[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: General
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Search
[ OK! ] NO DUPLICATE LINKS FOUND IN SECTION: Engines

should be treated as a single category General Search Engines

Imo the script should always finish, but in the case duplicates are found it should exit with code 1

from awesome-hacker-search-engines.

edoardottt avatar edoardottt commented on May 21, 2024

Super, there is only one error to correct :)

from awesome-hacker-search-engines.

milinddethe15 avatar milinddethe15 commented on May 21, 2024

Hi @edoardottt ,
In the previous script, I am not able to solve the issue of multi-word categories name which should be in a single category which you mentioned in your reply. (Please give your input in this eror)
So, I have updated script in which if duplicate links is found it will print the duplicate link and exit with code 1.


readme="README.md"

pwd=$(pwd)

if [[ "${pwd: -7}" == "scripts" ]];
then
    readme="../README.md"    
fi

# Function to extract links from a section and check for duplicates
check_section() {
    section=$1
    section_escaped=$(sed 's/[&/\]/\\&/g' <<< "$section")
    section_content=$(awk -v section="$section" '/^### / {p=0} {if(p)print} /^### '"$section"'/ {p=1}' "$readme")
    duplicate_links=$(echo "$section_content" | grep -oP '\[.*?\]\(\K[^)]+' | sort | uniq -d)
    if [[ -n $duplicate_links ]]; then
        echo "[ ERR ] DUPLICATE LINKS FOUND"
        echo "$duplicate_links"
        exit 1
    fi
}

# Get all unique section headings from the README file and handle spaces and slashes
sections=$(grep '^### ' "$readme" | sed 's/^### //' | sed 's/[\/&]/\\&/g')

# Call the function for each section
for section in $sections; do
    check_section "$section"
done

Running this script:

$ ./scripts/check-dups.sh 
awk: warning: escape sequence `\/' treated as plain `/'

gives this warning and am not able to resolve it.

Please give me your inputs on which script to use and its error.
Thanks.

from awesome-hacker-search-engines.

edoardottt avatar edoardottt commented on May 21, 2024

Sorry @milinddethe15 , open a pull request with the code you are trying to push and we can discuss better there. Here it's difficult using comments

from awesome-hacker-search-engines.

edoardottt avatar edoardottt commented on May 21, 2024

Completed, thank you so much @milinddethe15 for your contribution!

from awesome-hacker-search-engines.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.