Giter Site home page Giter Site logo

practical-data-science / ecommercetools Goto Github PK

View Code? Open in Web Editor NEW
222.0 222.0 47.0 153 KB

EcommerceTools is a Python data science toolkit for ecommerce, marketing science, and technical SEO analysis and modelling and was created by Matt Clarke.

License: MIT License

Python 60.42% Jupyter Notebook 39.58%
customer customers ecommerce marketing marketing-analytics marketing-tools retail seo seo-optimization seotools

ecommercetools's People

Contributors

flyandlure avatar practical-data-science avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ecommercetools's Issues

Feature request - Add support for other languages when calling google_autocomplete()

The google_autocomplete() function currently supports en only because the expanded suggestions data are in English (and I don't reliably speak other languages). Extend support for other languages...

Here are some prefixes. I send you two lists: one with accents, and one without. In Spanish people tend to search without accents on desktop and with accents on mobile (because of the auto-dictionary) so feel free to choose.

("quien es *", "qué es *","dónde", "dónde esta *", "cuándo *", "por qué *", "cómo *", "mejor", "barato", "peor", "es", "qué", "cuando", "quién","cuál es","cuánto",)

("quien es *", "que es *","donde", "donde esta *", "cuando *", "por que", "como *", "mejor", "barato", "peor", "es *", "que", "cuando", "quien","cual es","cuanto",)

Please do let me know if I can be of further help :)

Best,
Juan

sku counter in get_customers not counting unique skj

Hello, amazing tool!

I tried to use it for my business but found that the columns skus (which I understood to be the unique sku purchased for each customer) it does not aggregate as such.

example using .load_sample_data() for customer 12347 has 103 different skus, but tool columns skus reports 7

image

seo.query_google_search_console cant read JSON Client Secret file

I already generate JSON Client Secret from Google API then I run Google Search Console command

df = seo.query_google_search_console(key, site_url, payload)

Then Terminal respond

Error: Service account info was not in the expected format, missing fields token_uri, client_email.

_get_xml() can fail when a server requires a user agent

Change seo.py...

import pandas as pd
import urllib.request
from urllib.request import Request
from urllib.parse import urlparse
from bs4 import BeautifulSoup


def _get_xml(url: str):
    """Scrapes an XML sitemap from the provided URL and returns XML source.
    Args:
        url (string): Fully qualified URL pointing to XML sitemap.
    Returns:
        xml (string): XML source of scraped sitemap.
    """

    try:
        response = urllib.request.urlopen(Request(url, headers={'User-Agent': 'Mozilla'}))
        xml = BeautifulSoup(response,
                            'lxml-xml',
                            from_encoding=response.info().get_param('charset'))
        return xml
    except Exception as e:
        print("Error: ", e)

Notice of archival

Matt Clarke, the creator of this project and owner of this repo, has unfortunately passed away.

No new work will be happening in this repo, and it should be considered archived and read-only.

Forks are welcomed and encouraged to keep this project alive.

Thank you for your understanding.

seo.get_indexed_pages cant running

When i run this command :
from ecommercetools import seo

urls = ['https://www.bbc.co.uk']
df = seo.get_indexed_pages(urls)
print(df.head())

it Response :

Traceback (most recent call last):
File "....../get-index-value.py", line 7, in
df = seo.get_indexed_pages(urls)
File "....../lib/python3.8/site-packages/ecommercetools/seo/google_search.py", line 88, in get_indexed_pages
site_data = {'url': site, 'indexed_pages': _count_indexed_pages(site)}
File "....../lib/python3.8/site-packages/ecommercetools/seo/google_search.py", line 73, in _count_indexed_pages
return _parse_site_results(response)
File "....../lib/python3.8/site-packages/ecommercetools/seo/google_search.py", line 58, in _parse_site_results
indexed = int(string.split(' ')[1].replace(',', ''))
ValueError: invalid literal for int() with base 10: '43.500.000'

Need your help. Thanks.

response from _get_results(query) contains NoneType which leads to parsing Fail

Hi Matt,

trying to scrape from google, I followed your blogpost on 3 lines google scraping and got the following error:

AttributeError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 results = seo.get_serps("stupid")
      2 print(results)
File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:144, in get_serps(query, output)
    133 """Return the first 10 Google search results for a given query.
    134 
    135 Args:
   (...)
    140     results (dict): Results of query.
    141 """
    143 response = _get_results(query)
--> 144 results = _parse_search_results(response)
    146 if results:
    147     if output == "dataframe":

File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:124, in _parse_search_results(response)
    118 output = []
    120 for result in results:
    121     item = {
    122         'title': result.find(css_identifier_title, first=True).text,
    123         'link': result.find(css_identifier_link, first=True).attrs['href'],
--> 124         'text': result.find(css_identifier_text, first=True).text
...
    125     }
    127     output.append(item)
    129 return output

AttributeError: 'NoneType' object has no attribute 'text'

then i tried your other blogpost scrape with python, which is not relying on the ecommercetools package, and followed it to the T.
here is the interesting part:

results = google_search("stupid")
results

yields normal output, rerunning this (jupyter cell) with keyword

results = google_search("allergy")
results

yields

AttributeError                            Traceback (most recent call last)
Cell In[9], line 1
----> 1 results = google_search("allergy")
      2 results

Cell In[8], line 3, in google_search(query)
      1 def google_search(query):
      2     response = get_results(query)
----> 3     return parse_results(response)

Cell In[7], line 17, in parse_results(response)
     10 output = []
     12 for result in results:
     14     item = {
     15         'title': result.find(css_identifier_title, first=True).text,
     16         'link': result.find(css_identifier_link, first=True).attrs['href'],
---> 17         'text': result.find(css_identifier_text, first=True).text
     18     }
     20     output.append(item)
     22 return output

AttributeError: 'NoneType' object has no attribute 'text'

So sometimes, the result.find(css_identifier_text, first=True): yields True , but NoneType ??
I have no Idea, under which circumstances this NoneType arises, but the behavior is as follows:
the seo.get_serps() from ecommercetools consistently throws the error, the "hand written" equivalent is keyword sensitive, e.g. "allergy" throws the error, "keyword sensitive" does not.

Scikit dependency deprecated in pip

There's an error on dependencies when it tries to install scikit package. This should be updated to scikit-learn.

` × Building wheel for sklearn (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [18 lines of output]
The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
rather than 'sklearn' for pip commands.

  Here is how to fix this error in the main use cases:
  - use 'pip install scikit-learn' rather than 'pip install sklearn'
  - replace 'sklearn' by 'scikit-learn' in your pip requirements files
    (requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
  - if the 'sklearn' package is used by one of your dependencies,
    it would be great if you take some time to track which package uses
    'sklearn' instead of 'scikit-learn' and report it to their issue tracker
  - as a last resort, set the environment variable
    SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True to avoid this error
  
  More information is available at
  https://github.com/scikit-learn/sklearn-pypi-package
  
  If the previous advice does not cover your use case, feel free to report it at
  https://github.com/scikit-learn/sklearn-pypi-package/issues/new
  [end of output]`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.