microsoft / pubsec-info-assistant Goto Github PK

View Code? Open in Web Editor NEW

266.0 99.0 526.0 87.31 MB

Information Assistant, built with Azure OpenAI Service, Industry Accelerator

License: MIT License

Dockerfile 0.62% Shell 11.08% Makefile 0.45% Python 40.80% HTML 0.39% TypeScript 26.46% CSS 7.49% HCL 12.71%

pubsec-info-assistant's Issues

Secrets in plaintext in Deployment

Describe the bug
I am getting sec policy violation that the deployment contains passwords.

To Reproduce
Deploy solution.

Screenshots

Deploying vNext-Dev error

When attempting to deploy the latest version of the vNext-Dev branch an error is occurring that prevents the deployment from completing.

Failed to parse main.parameters.json with exception:
Failed to parse 'main.parameters.json', please check whether it is a valid JSON format

The error appears to be caused by a missing value in the main.parameters.json when it is written out during the build process

"chatGptDeploymentCapacity": {
"value":
},

This maybe related to issue #282

URL/Website content extraction

Feature request
Many federal customers have public documents (PDFs) and websites (including FAQs) that they would like to search using Info-Assistant.

Additional Details
Support crawling and extracting content from URL/website with recursion up to a certain configurable depth. Also provide support for filtering out certain URLs like forms, pages that call APIs (like office locator and such) and/or certain domains.

First makedeploy error: Error while attempting to download Bicep CLI

When running makedeploy the first time, I got this error:
[Errno 1] Operation not permitted: '/home/vscode/.azure/bin/bicep'

Then i run makedeploy again and got another error:
nvalidTemplate - Deployment template validation failed: 'The template resource 'infoasst-aoai-pwrsg/' for type 'Microsoft.CognitiveServices/accounts/deployments' at line '1' and column '2133' has incorrect segment lengths. A nested resource type must have identical number of segments as its resource name. A root resource type must have segment length one greater than its resource name

Azure Sample Estimation: AI Document Intelligence is picking up large SKU

Describe the bug
When using the AI Document Intelligence service to estimate the Azure sample costs, the service selects the large SKU option by default. This results in an overestimation of the Azure costs for sandbox environment.

To Reproduce

Expected behavior
The expected behavior is that the service should either select the default or the smallest SKU option, or prompt the user to choose the SKU size if the document does not provide it.

Screenshots

Additional context
Add any other context about the problem here.

WebApp Deployment Failed: Diagnostic settings does not support retention for new diagnostic settings.

Describe the bug
Diagnostic settings does not support retention for new diagnostic settings. (Code: BadRequest, Target: /subscriptions//resourceGroups/infoasst-ws-peter/providers/Microsoft.Resources/deployments/web)

Complete log
ERROR: {"status":"Failed","error":{"code":"DeploymentFailed","target":"/subscriptions//providers/Microsoft.Resources/deployments/infoasst-ws-peter","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.","details":[{"code":"ResourceDeploymentFailure","target":"/subscriptions/1/resourceGroups/infoasst-ws-peter/providers/Microsoft.Resources/deployments/web","message":"The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'.","details":[{"code":"DeploymentFailed","target":"/subscriptions//resourceGroups/infoasst-ws-peter/providers/Microsoft.Resources/deployments/web","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.","details":[{"code":"BadRequest","target":"/subscriptions/1/resourceGroups/infoasst-ws-peter/providers/Microsoft.Resources/deployments/web","message":"{\r\n "code": "BadRequest",\r\n "message": "Diagnostic settings does not support retention for new diagnostic settings."\r\n}"}]}]}]}}
make: *** [Makefile:18: infrastructure] Error 1

To Reproduce
make deploy

Expected behavior
no error message should appear

Beta version details

GitHub branch: [e.g. main]

Error when GPT3-5-16k is not available in your region

Describe the bug

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Alpha version details

GitHub branch: [e.g. main]
Latest commit: [obtained by running git log -n 1 <branchname>

Additional context
Add any other context about the problem here.

GitHub Broken Links

Describe the bug
User are reporting issues on broken links for GitHub while navigating to the documentation and other github liks.

To Reproduce
Steps to reproduce the behavior:

Go to '[Readme.md] (https://github.com/docs/development_environment.md)'
Click on 'URLs within the readme.md file'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

Additional context
Add any other context about the problem here.

Problems with foreign characters

Describe the bug
Foreign characters (danish "æøå" in this case) are incorrectly displayed in "Citation -> Document section -> Content". For instance, "å" is shown as "Ã¥". Looks like a mixup of utf-8 and 16.

To Reproduce
Steps to reproduce the behavior:

Upload a html document stored in utf-8 w. bom. In that file foreign characters are not escaped (and shouldn't be, because utf-8).

Expected behavior
The user should be presented with the proper characters in the UI.

Alpha version details

GitHub branch: main (gamma)
Latest commit: 594c965

Additional context

Text Enrichment function not quoting blob paths correctly

We have some files with percentage (%) symbols in them, which appear to cause an issue when getting to the Text Enrichment stage of the Function App due to the way the get_blob_and_sas function works. Example file name: Unemployment rate back up to 3.7% in October _ Australian Bureau of Statistics.pdf

I would suggest replacing the code that manually substitutes spaces (below) with a proper URL quoting function like blob_path = urllib.parse.quote(blob_path)

PubSec-Info-Assistant/functions/shared_code/utilities_helper.py

Line 52 in 7fa4561

source_blob_path = source_blob_path.replace(" ", "%20")

The ability (or instructions) on how to remove documents from the system

In our testing of the accelerator we have noticed that some documents become dominant, and/or are uploaded in error and skew the results. Currently the way to resolve this appears to be delete the whole accelerator instance and start again. While during development iterations this is OK, once we get to a "Production" instance the ability to manage the data will be critical. Could we get either the ability to remove documents from the 'content library' in a future build?

Tags can only contain ASCII, fails silently

Describe the bug
Adding tags containing none-ascii characters when uploading files results in an error.
In the console:
deserializationPolicy.js:164 Uncaught (in promise) RestError: The metadata specified is invalid. It has characters that are not permitted.

The reason is that Azure blob storage metadata can only contain ASCII.

To Reproduce

Go to Manage content, Upload files
Add a tag: rødgrød
Drop a file in the square.

The file is written to the log-database ("uploaded"), but it never actually reaches the blob storage.

Expected behavior

At least display an error in the UI.
The tag should be encoded in some way and decoded when in use.

Alpha version details

GitHub branch: main

Unable to deploy environment.

Describe the bug
Unable to deploy the accelerator. I am following the instructions. I sucesfuly created env file.
I also created workspace and resource group on the Azure Subscription.

To Reproduce
Steps to reproduce the behavior:
When I try to build environment, I get an error.

Expected behavior
Infrastructure should be deployed.

Screenshots

Desktop (please complete the following information):
Windows 11, Visual Studio Code

What I am doing wrong?

ERROR: failed to export: exporting app layers: caching layer

Describe the bug
After running azd up, it creates the necessary docker images, everything seems fine until this last step and I received this error

"
[exporter] ERROR: failed to export: exporting app layers: caching layer (sha256:97e68c8f07ff95f2a9c3d994aa8fb9800cda648748531e3d9f595e53c29e9351): write /launch-cache/staging/sha256:97e68c8f07ff95f2a9c3d994aa8fb9800cda648748531e3d9f595e53c29e9351.tar: no space left on device
, stderr: [builder]
[builder] [notice] A new release of pip is available: 23.2.1 -> 23.3.1
[builder] [notice] To update, run: pip install --upgrade pip
ERROR: failed to build: executing lifecycle: failed with status code: 62
"

I have over 300GB free space left on my Mac.

Desktop (please complete the following information):

OS: MacOS Sonoma on Apple Silicon

Adding html-files with inline images (base64-encoded) pollutes search index

Describe the bug
I'm trying to add html-files with images by specifying the image data as base64-encoded in the src-attributes of the image tag.

While this works for the document preview, it seems that the image data is included in the search index where I don't think it will be of any use and probably cause a lot of noise.

Example of document (cropped) from the search index:

 "translated_text": "{\n ...<img data-sp-prop-name=\\\"imageSource\\\" src=\\\"data:image/png;base64,  iVBORw0KGgoAAAANSUhEUgAAA68AAAIICAYAAACB9tgLAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMA...

I chose this approach as I believe this is what Mammoth does with docx-files.
If this is not how images should be supplied, please advise.

To Reproduce
Steps to reproduce the behavior:
Upload html with embedded, inline images.

Expected behavior
Images with be show in the document preview. Search index will be lean not contain the base64-encoded data.

Alpha version details

GitHub branch: Main / gamma
Latest commit: deb6408

Additional context
Add any other context about the problem here.

Container didn't respond to HTTP pings on port: 8000

Discussed in #285

^{Originally posted by andresravinet October 18, 2023}
Can anyone help me? I'm getting an Application Error when browsing to the web app. And the below is what is showing up in the log.
Container infoasst-web-6cq0e_0_17db8b46 didn't respond to HTTP pings on port: 8000, failing site start. See container logs for debugging.

Web app doesn't start when there is new AOAI resource used

Describe the bug
With the new Delta version the web app doesn't start when there is new AOAI resource used.

To Reproduce
Steps to reproduce the behavior:

Deploy a brand new Delta version with new AOAI resource (USE_EXISTING_AOAI=false)
Web app expecting model deployment with the name of "chat" but the default name is "gpt-35-turbo-16k"

Expected behavior
Web app up and running :)

Delta version details

GitHub branch: main
Latest commit: 0cc8d71

Deployment Error: Insufficient quota - GPT-35-Turbo-16K

Describe the bug
Users are facing issue while deploying "gpt-35-turbo-16K" as default capacity set to 720

To Reproduce
Steps to reproduce the behavior:

Run "make deploy"

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

Running the App Locally - Local Development

The app is deployed and working fine. But when tried to run it locally from the 'backend' folder, it show an error like this:

Traceback (most recent call last): File "/workspaces/PubSec-Info-Assistant/app/backend/app.py", line 58, in <module> azure_search_key_credential = AzureKeyCredential(AZURE_SEARCH_SERVICE_KEY) File "/home/vscode/.local/lib/python3.10/site-packages/azure/core/credentials.py", line 67, in __init__ raise TypeError("key must be a string.") TypeError: key must be a string.

Please note that, infrastructure.env file is created with AZURE_SEARCH_SERVICE_KEY and other infrastructure credentials. Also, on a side note, some 'AllMetrics', 'AppServicePlatformLogs' and 'AppServiceAppLogs' were disabled during deployment for some conflict/error potentially due to subscription.

Error when using danish texts

Describe the bug
When uploading danish documents, they are not processed. If the documents are in English they are successfully processed.

To Reproduce

Upload a danish text.
In the upload status of the web app, the document appears as "Completed"

The chat does not use the document to answer.
Going to the Azure Portal, and inspecting the 'Search Service'. We see that the 'Indexers' file all-files-indexers are returning an Error.

The error is not much informative.
6. However, if we create an 'Debug session' in the 'Search service' on the danish PDF, we see more information:

So it seems like the PII skill does not support the danish language, but the documentation describes that it does: https://learn.microsoft.com/en-us/azure/ai-services/language-service/personally-identifiable-information/language-support?tabs=documents
We have tried changing the PII language in the code to EN but without luck (in the language specification file). It is still the same error.

Expected behavior
That danish files works.

Desktop (please complete the following information):

Azure environment
Browser: chrome and safari

Alpha version details

GitHub branch: 0.3 Main
Latest commit: 0f9d4f6

Citations not showing 100% of the time

Detailed Description
Citations are not showing 100% of the time in responses. This is more prevalent in GPT 3.5 with chat completion API (more likely to not get citation versus get citation). Citations are more consistent in GPT 4.

Expected behavior
Any response that is not a 'I do not have enough information' should have citations for the answer.

Version details

Deployment fails with a media service error in northcentralus

FYI - when I tried to deploy to LOCATION="northcentralus" I got the following error for the media service:

{
    "status": "Failed",
    "error": {
        "code": "BadRequest",
        "message": "Creation of new Media Service accounts are not allowed as the resource has been deprecated.",
        "details": [
            {
                "code": "MediaServiceAccountCreationDisabled",
                "message": "Creation of new Media Service accounts are not allowed as the resource has been deprecated."
            }
        ]
    }
}

Changing to LOCATION="eastus" deployed without errors.

Inconsistencies in File Upload -> Queued/Processing

We're having issues in our instance with Uploading files ... Word & PDF docs. Some of our test docs are only 15-20kb, some PDFs 3-5Mb.

File Upload appears to work (Upload bar goes grey + green tick). But the file doesn't turn up in the Queue in Upload Status. This works sometimes (we have a mix of 15-20 files uploaded) but sometimes not (seem most of the time over the past week or so).

Various users, different times of the day.

We have adjusted the Cognitive Search Indexer to a 5min interval, but I don't think it's even making it into the queue.

Anyone having similar issue ... or thoughts on things to check/fix pls?

++++

release: 0.3 Gamma

Instance: infoasst-aoai-3z1n0
Deployment Name: gpt-35-turbo-16k
Model Name: gpt-35-turbo-16k
Model Version: 0613

Azure Cognitive Search
Service Name: infoasst-search-3z1n0
Index Name: all-files-index

System Configuration
System Language: English

Desktop
Windows 11 23H2
Edge: 119.0.2151.12
... but issue is affecting various other users/desktop/browser combinations

Errors with new GPT-4 and GPT-3.5 versions

When attempting to use GPT4 0613 or GPT3.5

The following error comes up:
The completion operation does not work with the specified model, gpt-4. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.

This looks like a code base error as the Completion operation is not supported by gpt-35-turbo(0613) and gpt-35-turbo-16k(0613) models. These models only support Chat Completions API. Only older turn model - GPT-3.5 Turbo (0301) supports both Chat and completions API. Please refer to the GPT-3.5 models for details.

On top of that, the accelerator is also using the old version of even GPT3.5, 0301. we need to switch to the new version, 0613, which yields better results. This will require the step above as well.

This should be configurable and not hard coded on the bicep.

Reference to the Bicep code https://github.com/microsoft/PubSec-Info-Assistant/blob/deb64086a5b1c5b720fdd6e7dbded05db7f8d2cc/infra/main.bicep#L180C9-L187C46

model: {
format: 'OpenAI'
name: chatGptModelName
version: '0301'
}
sku: {
name: 'Standard'
capacity: chatGptDeploymentCapacity

Action required: self-attest your goal for this repository

It's time to review and renew the intent of this repository

An owner or administrator of this repository has previously indicated that this repository can not be migrated to GitHub inside Microsoft because it is going public, open source, or it is used to collaborate with external parties (customers, partners, suppliers, etc.).

Action

👀 ✍️ In order to keep Microsoft secure, we require repository owners and administrators to review this repository and regularly renew the intent to either opt-in or opt-out of migration to GitHub inside Microsoft which is specifically intended for private or internal projects.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived. 🔒

Instructions

❌ Opt-out of migration

If this repository can not be migrated to GitHub inside Microsoft, you can opt-out of migration by replying with a comment on this issue containing one of the following optout command options below.

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

staging : My project will ship as Open Source

collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.

delete : This repository will be deleted because it is no longer needed.

other : Other reasons not specified

✅ Opt-in to migrate

If the circumstances of this repository has changed and you decide that you need to migrate, then you can specify the optin command below. For example, the repository is no longer going public, open source or require external collaboration.

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

Click here for more information about optin and optout command options and examples

Opt-in

@gimsvc optin --date <target_migration_date>

When opting-in to migrate your repository, the --date option is required followed by your specified migration date using the format: mm-dd-yyyy

@gimsvc optin --date 03-15-2023

Opt-out

@gimsvc optout --reason <staging|collaboration|delete|other>

When opting-out of migration, you need to specify the --reason.

staging

My project will ship as Open Source

collaboration

Used for external or 3rd party collaboration with customers, partners, suppliers, etc.

delete

This repository will be deleted because it is no longer needed.

other

Other reasons not specified

Examples:

@gimsvc optout --reason staging

@gimsvc optout --reason collaboration

@gimsvc optout --reason delete

@gimsvc optout --reason other

Need more help? 🖐️

Email [email protected]. ✉️
Post your questions in GitHub inside Microsoft Team in Microsoft Teams.

Incorrect value in queue message

The value set in this line of code below, should be "text_enrichment_queued_count".

PubSec-Info-Assistant/functions/FileLayoutParsingOther/__init__.py

Line 194 in 0cc8d71

message_json["enrichment_queued_count"] = 1

Token limit often exceeded with PDF files

We have some large PDF files and during the chunking process, it seems to be often creating chunks that well exceed the "target size". For example, one document (which can be downloaded here), has one chunk over 80,000 tokens in length.

There are several other chunks created from this same file that are smaller but still exceed the target size by a substantial amount.

OpenAPI specification for backend

Is your feature request related to a problem? Please describe.
Some customers would like to use only the backend part of IA and bring their own frontend. Also there is a requirement for APIM integration.

Describe the solution you'd like
OpenAPI (Swagger) specification for the chat endpoint and other endpoints, in a same way like at the enrichmentweb app.

"Error: Access denied due to invalid subscription key or wrong API endpoint."

After provisioning the PubSec Accelerator as per the installation instructions, the chat function doesn't work and returns an error "Error: Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource."

I receive the same error when testing a solution deployed into Australia East and also East US.

I am able to login into the Azure Open AI Studio using the same instance that was provisioned and use the Chat Feature in there with any issues.

In the configuration section of the WebApp by default the Service details are blank
{
"name": "AZURE_OPENAI_SERVICE",
"value": "",
"slotSetting": false
},
{
"name": "AZURE_OPENAI_SERVICE_KEY",
"value": "",
"slotSetting": false
},
but appear to be correct, as when I set these to either the end created or an existing end point, with the key value different errors are displayed on querying the chat:

Error: Error communicating with OpenAI: HTTPSConnectionPool(host='https', port=443): Max retries exceeded with url: //infoasst-aoai-w8ilp.openai.azure.com/.openai.azure.com//openai/deployments/gpt-35-turbo/completions?api-version=2023-06-01-preview (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x761c54bd59f0>: Failed to resolve 'https' ([Errno -2] Name or service not known)"))

Error: The completion operation does not work with the specified model, gpt-35-turbo. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.

In both instances above I had used the 'What impact does China have on climate changes' as the query.

Azure CLI version 2.48.1 or later is required to run this script

The deployment fails on the error on Azure CLI version, even though I seem to have a latest version?

Installer seems to ignore authentication setting

Describe the bug
Post-installation, the application requires end user to add themselves as an authorized user.

To Reproduce
Steps to reproduce the behavior:

Install from main (Delta) with authentication set to false:

4. Find website as instructed in instructions
5. Note that error is received stating that the application has been configured to block users unless they are specifically granted access.
6. See error

Expected behavior

Authentication setting should be honored.

Version details

GitHub branch: main (Delta)

Support for M365 (SharePoint Online, OneDrive) content

Issue at hand
Almost all federal customers have big M365 presence. And have lots of documentation in SharePoint Online (SPO), and OneDrive. Few of my customers (VA, BEA, and NIH) have docs in SPO.

A possible solution
Extract and index documents and pages (site contents) in SPO and OneDrive using Graph SDK/REST

Describe alternatives you've considered
Other ways are to use LogicApps to get data from SPO, store them in blob and then use Cog Search. It is very painful and adds complexity like data sync issues, duplicate storage, access and retention.

PDF Tables with RowSpan and ColSpan not interpreted correctly

Describe the bug
I have attached a PDF (publicly available) here. On Page 3 there is a table for VA pension benefits. For the question 'Can you tell me the full eligibility rules for receiving VA pension", the answer returned is incorrect. I have attached a screenshot of the issue.

To Reproduce
Steps to reproduce the behavior:

Install the vNext-Dev version of the Information Assistant using bge embedding model
Upload the attached pdf file
Ask the question - "Can you tell me the full eligibility rules for receiving VA pension" and see the error as indicated in the attachment below.

Expected behavior
A clear explanation of legibility conditions that should match the ALL the content in Page 3 of the attached document.

Screenshots

Alpha version details

GitHub branch: vNext-Dev

Additional context
summaryofvanationalguardandreserve.pdf

MacOS / bash 5.2 compatibility updates

MacOS 13.5, M1 Pro.

deploy-enrichment-webapp.sh, deploy-webapp.sh:

# original
# end=`date -u -d "3 years" '+%Y-%m-%dT%H:%MZ'`

# Bash 5.2 - friendly
end=$(date -u -v+3y '+%Y-%m-%dT%H:%MZ')

inf-create.sh:

# original
#randomString=$(mktemp --dry-run XXXXX)
#randomString="${randomString,,}"

# Bash 5.2 - friendly
randomString=$(mktemp -u XXXXX)
randomString=$(echo "$randomString" | tr '[:upper:]' '[:lower:]')

Azure Function not running for uploaded files making them unavailable to chat with

Describe the bug
A clear and concise description of what the bug is.
I am receiving an error message when asking my Accelerator questions. Pictures included for reference on this issue.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error
Error message show in screenshots when asking a question as well as when chatting.

Expected behavior
A clear and concise description of what you expected to happen.
I expected to receive an answer to my question referencing the PDF documents I had uploaded to my accelerator, however each time I am met with an error message.
Screenshots
If applicable, add screenshots to help explain your problem.
Included above.
Desktop (please complete the following information):

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]
Windows 11 Enterprise, 22H2 Version, Microsoft Edge browser
Alpha version details
GitHub branch: [e.g. main]
Latest commit: [obtained by running git log -n 1 <branchname>

Additional context
Add any other context about the problem here.

Sizing Estimator - Missing costestimator.md in the docs directory

Describe the bug
The main README.md has a section on Sizing Estimator, which points a non existent document costestimator.md in the docs folder

Alpha version details

GitHub branch: main
Latest commit: [obtained by running git log -n 1 <branchname>

$ git log -n 1 main
commit b5dc8b368758b479b4f94b061b43c4b8ac94cb00 (HEAD -> main, origin/main, origin/HEAD)
Merge: ca99857 797ff84
Author: dayland <[email protected]>
Date:   Fri Jun 23 08:11:16 2023 +0000

    Merge pull request #101 from microsoft/dayland/5707-missing-search-indexer-property
    
    add "allowSkillsetToReadFileData" property to search indexer

Support multiple front end (Desktop, PVA, Power Platform, D365, CoPilots)) using APIs or SDK

Is your feature request related to a problem? Please describe.
Currently, IA is a website. If needed in other platforms (Desktop, Power Platform, SharePoint), the only way is to use IFrame, which is not preferred.

Possible solution
Abstract all functionalities to an API layer (or possibly a SDK) with API security. That way, IA becomes a hosting platform agnostic solution and can be called by anybody.
By doing so, I can leverage IA's ingestion process only and mix and match.

InvalidTemplate - Deployment template validation failed - During infrastructure deploy.

Describe the bug

The configuration value of bicep.use_binary_from_path has been set to 'false'.
Successfully installed Bicep CLI to "/home/vscode/.azure/bin/bicep".
InvalidTemplate - Deployment template validation failed: 'The template resource 'infoasst-aoai-9qt5q/' for type 'Microsoft.CognitiveServices/accounts/deployments' at line '1' and column '2146' has incorrect segment lengths. A nested resource type must have identical number of segments as its resource name. A root resource type must have segment length one greater than its resource name. Please see https://aka.ms/arm-syntax-resources for usage details.'.
make: *** [Makefile:18: infrastructure] Error 1

To Reproduce
Steps to reproduce the behavior:

az login
make build
make deploy

Expected behavior
Expected to see a successful depoloy return result

Screenshots

Desktop (please complete the following information):
OS: Codespace - Debian GNU/Linux 11 (bullseye),
Host OS: Ubuntu 22.04.02 LTS
Browser: Version 114.0.5735.198 (Official Build) (64-bit)

Alpha version details

GitHub branch: main
Latest commit: [obtained by running git log -n 1 <branchname>

@fitchtravis ➜ /workspaces/PubSec-Info-Assistant (main) $ git log -n 1 main
commit b5dc8b368758b479b4f94b061b43c4b8ac94cb00 (HEAD -> main, origin/main, origin/HEAD)
Merge: ca99857 797ff84
Author: dayland <[email protected]>
Date:   Fri Jun 23 08:11:16 2023 +0000

    Merge pull request #101 from microsoft/dayland/5707-missing-search-indexer-property
    
    add "allowSkillsetToReadFileData" property to search indexer

Additional context
Happens during a make infrastructure

Fail to process documents due to missing language pickle file

Describe the bug
Upon installation, documents uploaded fail to process with error message and 200 response code in logs:

Upon investigation, it appears that the download of the English language file has failed, though the ZIP file is present:

Per other deployments, this ZIP file should have been extracted. During runtime, the service attempts to download the file again, but then sees the ZIP file, and stops. Likely the root cause is the file not being extracted.

Note, the built-in gzip tool is unable to decompress as it does not recognize a ".zip" extension. Unknown if that is related to the file not being decompressed?

Version details

GitHub branch: main (Delta)
Latest commit:
- @lon-tierney ➜ /workspaces/PubSec-Info-Assistant (main) $ git log -n 1 main
  commit d46657d (HEAD -> main, origin/main, origin/HEAD)
  Merge: fbcd78f 679f7f5
  Author: ryonsteele [email protected]
  Date: Tue Nov 21 13:24:29 2023 -0500
Merge pull request #351 from microsoft/ryonsteele/6258-deploymentname-documentation

Update default bicep params to use default flavor model for deployment name

Additional context
Add any other context about the problem here.

Action required: migrate or opt-out of migration to GitHub inside Microsoft

Migrate non-Open Source or non-External Collaboration repositories to GitHub inside Microsoft

In order to protect and secure Microsoft, private or internal repositories in GitHub for Open Source, which are not related to open source projects or requiring collaboration with 3rd parties (customer, partners, etc.) must be migrated to GitHub inside Microsoft a.k.a GitHub Enterprise Cloud with Enterprise Managed User (GHEC EMU).

Action

✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒

Instructions

Reply with a comment on this issue containing one of the following optin or optout command options below.

✅ Opt-in to migrate

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

❌ Opt-out of migration

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

staging : My project will ship as Open Source

collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.

delete : This repository will be deleted because it is no longer needed.

other : Other reasons not specified

Need more help? 🖐️

Email [email protected]. ✉️
Post your questions in GitHub inside Microsoft Team in Microsoft Teams. 🗨️

Model text-davinci-003 not supported

Describe the bug
Can't deploy because the model isn't supported. Tried to create model in AOAI Studio and it doesn't exist.

This is the error I'm getting...

🎯 Target Resource Group: infoasst-myworkspct

InvalidTemplateDeployment - The template deployment 'infoasst-myworkspct' is not valid according to the validation procedure. The tracking id is '48674f37-c551-4d33-aa0f-017697b7dd2d'. See inner errors for details.
DeploymentModelNotSupported - Creating account deployment is not supported by the model 'text-davinci-003'. This is usually because there are better models available for the similar functionality.
make: *** [Makefile:18: infrastructure] Error 1

Is there a workaround?

Answers do not appear to be created from chunks returned

Describe the bug
You may ask the system a question at the start of a chat (or mid-chat) and notice that the answer provided by the system is not based upon data contained within the response chunks provided to OpenAI service. This may happen when the answer given is correct, or when it is incorrect (unrelated to accuracy of the answer).

To Reproduce
Steps to reproduce the behavior:

Ask a question that the system does not have information in documents to answer, such as "I use my boat for personal purposes only. Can I deduct the costs of the boat from my taxes?"
Note that the response may be factual, and may cite a document.
Review the cited document and see that the data required is nowhere in the document.
a. Specific document used: https://files.taxfoundation.org/20210823155834/TaxEDU-Primer-Common-Tax-Questions-Answered.pdf
Review the Thought Process to confirm that no chunk provided contained the answer

Expected behavior
The system should say that it is unable to answer the question, and not cite any document or document chunk.

Screenshots

Desktop (please complete the following information):

OS: Windows 11
Browser: Edge

Alpha version details

GitHub branch: main
Tag

Additional context
Tracked in AzDO 5791

Deployment without Administrative rights on the Azure subscription

I cannot run a make deploy due to limited rights on our public sector company Azure subscription ie the requirement to have administrative rights for the subscription. It is not common to have those rights for good reason in most public companies for security reasons, especially with experimental deployments like this Accelerator. What I have gotten is ownership of a resource group, but I do have to create the resource in order to deploy this stack, so I am at odds to get this application going for demonstration purposes.

A possibility to deploy by reference to the owned resource group or even a kubernetes or other containerized build.

I have tried to deploy to an existing resource group under my ownership to no avail. Alternatively I will build an application myself, but it is out of scope for this demonstrator and I do believe that easier deployment is good for these Accelerators.

ChatGPT Model versions updated in Australia East

Describe the bug
Error when doing deployment to Australia East
InvalidResourceProperties - The specified SKU 'Standard' for model 'gpt-35-turbo 0301' is not supported in this region 'australiaeast'.
To fix this I updated main.bicep line 147: version: '0613' as this is the only version available in Australia East

To Reproduce
Steps to reproduce the behavior:
in bash terminal $ make deploy
Expected behavior
Should complete deployment

Alpha version details

GitHub branch: main
Latest commit: [obtained by running git log -n 1 <branchname>

Additional context
Add any other context about the problem here.

FileFormRecSubmissionPDF - An error occurred - 'Response' object has no attribute 'code'

The FileFormRecSubmissionPDF fails to complete and accesses a non existant attribute from the HTTPS response object

To Reproduce
Steps to reproduce the behavior:

Clone the Beta branch
Fill in local.env
make deploy
Upload files to upload container
Open CosmosDB data explorer
Wait until status comes through with error on FileFormRecSubmissionPDF stage

Expected behavior
The FileFormRecSubmissionPDF should complete successfully or reque after calling the Form Recogniser API

Screenshots

Beta version details

GitHub branch: 0.2-Beta
Latest commit: commit e477742 (HEAD -> 0.2-Beta, origin/0.2-Beta)

Additional context
Suspected offending line of code:

PubSec-Info-Assistant/functions/FileFormRecSubmissionPDF/__init__.py

Line 103 in e477742

    
           statusLog.upsert_document(blob_path, f'{function_name} - Error on PDF submission to FR - {response.code} {response.message}', StatusClassification.ERROR, State.ERROR)

statusLog.upsert_document(blob_path, f'{function_name} - Error on PDF submission to FR - {response.code} {response.message}', StatusClassification.ERROR, State.ERROR)

Error: Diagnostic settings does not support retention for new diagnostic settings

After a couple of attempts to deploy the PubSec suite (deploy, delete, repeat switching from australia east to eastus), I began to encounter this error 'Diagnostic settings does not support retention for new diagnostic settings.' and the deployment would fail. With each attempt I had deleted all of the services created by the previous attempt, and changed the WORKSPACE="" to be unique.

I happened to come across this article:
https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/migrate-to-azure-storage-lifecycle-policy

After changing lines 138, 146 and 156 in the main.bicep file, setting the days value to 0 (instead of the default value of 30) the deployment completed successfully.

based on the information in the article, we'll need to update these setting after September when the deprecation comes into effect.

Non-PDF text-based documents should not queue to text_enrichment queue in 0.3-Gamma

Describe the bug
Currently non-PDF text-based files get queued to the "text_enrichment" queue after chunking is complete which is not needed.

Expected behavior
Non-PDF text-based files should report a status of "Complete" after chunking is finished.

Desktop (please complete the following information):

OS: Windows 11
Browser Edge
Version

Version details

GitHub branch: main
Latest commit: commit a2a9f4e (HEAD -> main, origin/main, origin/HEAD)

Additional context
None

Exception in utilities.py/build_chunks

There is an exception being raised here when processing a non-PDF document, as document_map['structure'][0]["subtitle"] is only set in build_document_map_pdf not build_document_map_html

I have fixed this in my code as below:

section = ''
subtitle = ''
title = ''

for tag in soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p', 'table']):
	if tag.name in ['h3', 'h4', 'h5', 'h6']:
		section = tag.get_text(strip=True)
	elif tag.name == 'h2':
		subtitle = tag.get_text(strip=True)
	elif tag.name == 'h1':
		title = tag.get_text(strip=True)
	elif tag.name == 'p' and tag.get_text(strip=True):
		document_map["structure"].append({
			"type": "text", 
			"text": tag.get_text(strip=True),
			"title": title,
			"subtitle": subtitle,
			"section": section,
			"page_number": 1                
			})
	elif tag.name == 'table' and tag.get_text(strip=True):
		document_map["structure"].append({
			"type": "table", 
			"text": str(tag),
			"title": title,
			"subtitle": subtitle,
			"section": section,
			"page_number": 1                
			})

vite build error: "type" is not exported by "__vite-browser-external"

Describe the bug
When running make deploy at the vite build state I get the following erros

12 packages are looking for funding
  run `npm fund` for details

found 0 vulnerabilities

> [email protected] build
> tsc && vite build

vite v4.4.0 building for production...
node_modules/@azure/storage-blob/dist-esm/storage-blob/src/TelemetryPolicyFactory.js (32:70) "type" is not exported by "__vite-browser-external", imported by "node_modules/@azure/storage-blob/dist-esm/storage-blob/src/TelemetryPolicyFactory.js".
node_modules/@azure/storage-blob/dist-esm/storage-blob/src/TelemetryPolicyFactory.js (32:83) "release" is not exported by "__vite-browser-external", imported by "node_modules/@azure/storage-blob/dist-esm/storage-blob/src/TelemetryPolicyFactory.js".
Terminated
make: *** [Makefile:15: build] Error 143

To Reproduce
Steps to reproduce the behavior:

az login
edit scripts/environment/local.env
make deploy
See errror

Expected behavior
The deploy process to complete without erros

Screenshots

Desktop (please complete the following information):

OS: Codespace - Debian GNU/Linux 11 (bullseye), Host OS: Ubuntu 22.04.02 LTS
Browser: Firefox (same issue in vscode)
Version: Firefox114.0.1 (64-bit)

Alpha version details

GitHub branch: main
Latest commit: [obtained by running git log -n 1 <branchname>


commit b5dc8b368758b479b4f94b061b43c4b8ac94cb00 (HEAD -> main, origin/main, origin/HEAD)
Merge: ca99857 797ff84
Author: dayland <[email protected]>
Date:   Fri Jun 23 08:11:16 2023 +0000

    Merge pull request #101 from microsoft/dayland/5707-missing-search-indexer-property
    
    add "allowSkillsetToReadFileData" property to search indexer

Additional context
Add any other context about the problem here.

List of CI/CD pipeline parameters are missing values

Describe the bug
When trying to set up an Azure DevOps pipeline using the azdo.yaml pipeline, not all parameters required are documented at https://github.com/microsoft/PubSec-Info-Assistant/blob/main/pipelines/ci-cd%20pipelines.md.

Expected behavior
All pipeline parameters required should be documented.

Screenshots

Desktop (please complete the following information):

OS: Windows 11
Browser n/a
Version 0.3-Gamma

Alpha version details

GitHub branch: main
Latest commit: 0.3-Gamma

microsoft / pubsec-info-assistant Goto Github PK

pubsec-info-assistant's Issues

Discussed in #285

It's time to review and renew the intent of this repository

Action

Instructions

❌ Opt-out of migration

✅ Opt-in to migrate

Opt-in

Need more help? 🖐️

Migrate non-Open Source or non-External Collaboration repositories to GitHub inside Microsoft

Action

Instructions

Need more help? 🖐️

Recommend Projects

Recommend Topics

Recommend Org