Basic Steps to Create Your Own Simple Copilot

Rod Trent

Jul 17, 2023

Some now, more later

Read →

9 Comments

SocInaBox

Jul 17, 2023

very cool Rod,

Where I'm struggling is:

Importing mixed media - scraped web sites, json files, complex CSV files (like kql that includes double quotes an other common separators)

Once I have my data indexed I want to work with multiple indexes - eg. security news, mitre att&ck library, etc - so I haven't thought much about how to work with multiple indexes/dataframes.

keep it up!

(I'm doing all my stuff in python/langchain/openai functions etc. but I'd love to have an Azure way to do it all.).

Expand full comment

Reply (1)

Rod Trent

Jul 17, 2023

You might try putting each media type in its own container with its own indexer.

Expand full comment

Reply (1)

SocInaBox

Jul 27, 2023

Hi Rod,

I'd be curious if you know of any good 'universal ingest scripts'.

For example there's a github for 'PrivateGPT' that uses this script for ingesting pdf, txt and csv files:

'https://github.com/imartinez/privateGPT/blob/main/ingest.py.

I feel that langchain has the lead on making GPT practical with all they're putting into their code.

Expand full comment

Reply (1)

Rod Trent

Jul 27, 2023Edited

Nothing at the moment, but I could slap together a PowerShell script to do it.

What I'm doing now is assigning a drive mapping to my blob container and just updating/uploading files that way.

Thinking about it further, it could be possible to use an Event Hub to do it.

Expand full comment

Reply (1)

SocInaBox

Jul 27, 2023

yeah, I've had some inconsistencies on getting various data sources ingested into my dataframe so I'm looking for a clean way to do it for the most common file formats. I'll keep looking, thanks.

eg. ingesting enterprise-attack.json (not that I need to, just an example)

ingesting kql queries in csv format - finding a good delimiter can be tricky w/o creating a specific python parser.

Expand full comment