Software Testing Basics

বিশেষজ্ঞ পেশাদারদের থেকে সেরা LinkedIn সামগ্রী এক্সপ্লোর করুন।

Brij Kishore Pandey Brij Kishore Pandey একজন প্রভাবশালী

AI Architect & AI Engineer | Building Agentic Systems & Scalable AI Solutions

৭,২৯,৩৩০ জন ফলোয়ার ২ বছর
এই পোস্টটি রিপোর্ট করুন
Demystifying the Software Testing 1️⃣ 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝗮𝗹 𝗧𝗲𝘀𝘁𝗶𝗻𝗴: 𝗧𝗵𝗲 𝗕𝗮𝘀𝗶𝗰𝘀: Unit Testing: Isolating individual code units to ensure they work as expected. Think of it as testing each brick before building a wall. Integration Testing: Verifying how different modules work together. Imagine testing how the bricks fit into the wall. System Testing: Putting it all together, ensuring the entire system functions as designed. Now, test the whole building for stability and functionality. Acceptance Testing: The final hurdle! Here, users or stakeholders confirm the software meets their needs. Think of it as the grand opening ceremony for your building. 2️⃣ 𝗡𝗼𝗻-𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝗮𝗹 𝗧𝗲𝘀𝘁𝗶𝗻𝗴: 𝗕𝗲𝘆𝗼𝗻𝗱 𝘁𝗵𝗲 𝗕𝗮𝘀𝗶𝗰𝘀: ️ Performance Testing: Assessing speed, responsiveness, and scalability under different loads. Imagine testing how many people your building can safely accommodate. Security Testing: Identifying and mitigating vulnerabilities to protect against cyberattacks. Think of it as installing security systems and testing their effectiveness. Usability Testing: Evaluating how easy and intuitive the software is to use. Imagine testing how user-friendly your building is for navigation and accessibility. 3️⃣ 𝗢𝘁𝗵𝗲𝗿 𝗧𝗲𝘀𝘁𝗶𝗻𝗴 𝗔𝘃𝗲𝗻𝘂𝗲𝘀: 𝗧𝗵𝗲 𝗦𝗽𝗲𝗰𝗶𝗮𝗹𝗶𝘇𝗲𝗱 𝗖𝗿𝗲𝘄: Regression Testing: Ensuring new changes haven't broken existing functionality. Imagine checking your building for cracks after renovations. Smoke Testing: A quick sanity check to ensure basic functionality before further testing. Think of turning on the lights and checking for basic systems functionality before a deeper inspection. Exploratory Testing: Unstructured, creative testing to uncover unexpected issues. Imagine a detective searching for hidden clues in your building. Have I overlooked anything? Please share your thoughts—your insights are priceless to me.
আর কোনও আগের কনটেন্ট নেই

আর কোনও পরবর্তী সামগ্রী নেই
৯৬ মন্তব্যসমূহ
লাইক কমেন্ট করুন
Mohan Nayak

Data Analyst | Automating MIS & Business Reporting using Excel, Power BI, SQL & Python | Manufacturing & Finance Reporting

৫৭,৬৩৩ জন ফলোয়ার ১ বছর
এই পোস্টটি রিপোর্ট করুন
𝗧𝘆𝗽𝗲𝘀 𝗼𝗳 𝗦𝗼𝗳𝘁𝘄𝗮𝗿𝗲 𝗧𝗲𝘀𝘁𝗶𝗻𝗴: 𝗔 𝗖𝗼𝗺𝗽𝗿𝗲𝗵𝗲𝗻𝘀𝗶𝘃𝗲 𝗢𝘃𝗲𝗿𝘃𝗶𝗲𝘄 𝟭. 𝗠𝗮𝗻𝘂𝗮𝗹 𝗧𝗲𝘀𝘁𝗶𝗻𝗴 Manual testing involves human effort to identify bugs and ensure the software meets requirements. It includes: 𝐖𝐡𝐢𝐭𝐞 𝐁𝐨𝐱 𝐓𝐞𝐬𝐭𝐢𝐧𝐠: Focuses on the internal structure and logic of the code. 𝐁𝐥𝐚𝐜𝐤 𝐁𝐨𝐱 𝐓𝐞𝐬𝐭𝐢𝐧𝐠: Concentrates on the functionality without knowledge of the internal code. 𝐆𝐫𝐞𝐲 𝐁𝐨𝐱 𝐓𝐞𝐬𝐭𝐢𝐧𝐠: Combines both White Box and Black Box techniques, giving partial insight into the code. 𝟮. 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗼𝗻 𝗧𝗲𝘀𝘁𝗶𝗻𝗴 Automation testing uses scripts and tools to execute tests efficiently, ensuring faster results for repetitive tasks. This approach complements manual testing by reducing time and effort. 𝟯. 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝗮𝗹 𝗧𝗲𝘀𝘁𝗶𝗻𝗴 Functional testing verifies that the application behaves as expected and satisfies functional requirements. Subtypes include: 𝐔𝐧𝐢𝐭 𝐓𝐞𝐬𝐭𝐢𝐧𝐠: Validates individual components or units of the application. 𝐔𝐬𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐓𝐞𝐬𝐭𝐢𝐧𝐠: Ensures the application is user-friendly and intuitive. 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝗮𝗹 𝘁𝗲𝘀𝘁𝗶𝗻𝗴 𝗳𝘂𝗿𝘁𝗵𝗲𝗿 𝗲𝘅𝘁𝗲𝗻𝗱𝘀 𝘁𝗼 :- 𝐈𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐢𝐨𝐧 𝐓𝐞𝐬𝐭𝐢𝐧𝐠: Tests the interaction between integrated modules. It has two methods: 𝗜𝗻𝗰𝗿𝗲𝗺𝗲𝗻𝘁𝗮𝗹 𝗧𝗲𝘀𝘁𝗶𝗻𝗴 :- 𝐁𝐨𝐭𝐭𝐨𝐦-𝐔𝐩 𝐀𝐩𝐩𝐫𝐨𝐚𝐜𝐡: Starts testing with lower-level modules. 𝐓𝐨𝐩-𝐃𝐨𝐰𝐧 𝐀𝐩𝐩𝐫𝐨𝐚𝐜𝐡: Begins testing with higher-level modules. 𝐍𝐨𝐧-𝐈𝐧𝐜𝐫𝐞𝐦𝐞𝐧𝐭𝐚𝐥 𝐓𝐞𝐬𝐭𝐢𝐧𝐠: Tests all modules as a single unit. 𝐒𝐲𝐬𝐭𝐞𝐦 𝐓𝐞𝐬𝐭𝐢𝐧𝐠: Tests the entire system as a whole to ensure it meets specified requirements. 𝟰. 𝗡𝗼𝗻-𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝗮𝗹 𝗧𝗲𝘀𝘁𝗶𝗻𝗴 Non-functional testing evaluates the performance, reliability, scalability, and other non-functional aspects of the application. Key subtypes include: 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗧𝗲𝘀𝘁𝗶𝗻𝗴 :- 𝐋𝐨𝐚𝐝 𝐓𝐞𝐬𝐭𝐢𝐧𝐠: Checks the application's behavior under expected load. 𝐒𝐭𝐫𝐞𝐬𝐬 𝐓𝐞𝐬𝐭𝐢𝐧𝐠:Tests the application's stability under extreme conditions. 𝐒𝐜𝐚𝐥𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐓𝐞𝐬𝐭𝐢𝐧𝐠: Assesses the application's ability to scale up. 𝐒𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐓𝐞𝐬𝐭𝐢𝐧𝐠:Ensures consistent performance over time. 𝐂𝐨𝐦𝐩𝐚𝐭𝐢𝐛𝐢𝐥𝐢𝐭𝐲 𝐓𝐞𝐬𝐭𝐢𝐧𝐠: Verifies that the application works across various devices, platforms, or operating systems. 𝗪𝗵𝘆 𝗦𝗼𝗳𝘁𝘄𝗮𝗿𝗲 𝗧𝗲𝘀𝘁𝗶𝗻𝗴 𝗠𝗮𝘁𝘁𝗲𝗿𝘀 Testing ensures a bug-free, reliable, and high-performing application. By combining manual and automated approaches with functional and non-functional testing techniques, developers can deliver a robust product that meets both user expectations and business requirements. Understanding these testing types helps teams choose the right strategy to achieve software excellence!
আর কোনও আগের কনটেন্ট নেই

আর কোনও পরবর্তী সামগ্রী নেই
১১৬ মন্তব্যসমূহ
লাইক কমেন্ট করুন
Sohan Sharma

Assistant Manager | Civil |

১২,১৯৪ জন ফলোয়ার ১ বছর
এই পোস্টটি রিপোর্ট করুন
Initial Pile load Testing Compression Test Pile ( Crown method) The crown method for pile compression testing uses a structural steel crown and ground anchors instead of a traditional steel grillage and dead load system. A hydraulic jack pushes against the crown, applying a downward force to the pile, while ground anchors provide the reaction force. This method is suitable for locations with limited space and is a variation of static pile load testing. Here's a more detailed breakdown: 1. Setup: A structural steel "crown" is placed on top of the pile. Hydraulic jacks are positioned between the pile head and the base of the crown. Diagonal ground anchors are installed from the pile head to the surrounding soil to provide the reaction force. A datum bar is used to measure pile movement during the test. 2. Load Application: The hydraulic jacks are progressively extended, applying an axial load to the pile. The load is increased incrementally. 3. Monitoring and Data Collection: With each increment of load, the pile's movement (displacement) is measured against the datum bar. Strain gauges, load cells, inclinometers, and displacement transducers may be installed on the pile to monitor its behavior under load. 4. Data Analysis: The collected data, including load-displacement behavior and strain distribution, is analyzed to assess the pile's performance and verify its capacity. Comparisons are made with design criteria to ensure the pile can withstand the design load. Tension Test Pile (Pull -Out Test) A pile pull-out test is a type of load test used to determine the resistance of a pile to vertical uplift forces. It involves applying a tensile (pulling) load to the pile head and measuring the force required to extract it from the ground, essentially testing the bond between the pile and the soil. This test is crucial for structures where uplift forces are a concern, such as foundations in expansive soils, or structures subjected to wind or water pressure. Purpose of the Pull-Out Test: Determine Uplift Resistance: The primary goal is to assess how much tensile force a pile can withstand before it is pulled out of the ground. Verify Design Assumptions: The test helps engineers verify the design capacity of the pile and ensure it can handle the anticipated uplift loads. Evaluate Soil-Pile Bond: The test provides information about the strength and integrity of the bond between the pile and the surrounding soil. Quality Assurance: In some cases, pull-out tests are used to ensure the quality and integrity of ground anchors, like those used in solar panel installations.
আর কোনও আগের কনটেন্ট নেই

আর কোনও পরবর্তী সামগ্রী নেই
৩৮ মন্তব্যসমূহ
লাইক কমেন্ট করুন
Milan Jovanović Milan Jovanović একজন প্রভাবশালী

Practical .NET and Software Architecture Tips | Microsoft MVP

২,৮১,১৩০ জন ফলোয়ার ৪ মাস
এই পোস্টটি রিপোর্ট করুন
If your test suite is 95% unit tests... you might be testing the wrong thing. Unit tests are great: - fast - cheap - perfect for domain rules But most real bugs don’t live in pure functions. They show up when code meets reality: - database mappings - transactions - serialization - config - external services That’s why my rule of thumb is: - 𝗨𝗻𝗶𝘁 𝘁𝗲𝘀𝘁 𝘆𝗼𝘂𝗿 𝗱𝗼𝗺𝗮𝗶𝗻 𝗹𝗼𝗴𝗶𝗰 - 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗶𝗼𝗻 𝘁𝗲𝘀𝘁 𝘆𝗼𝘂𝗿 𝘂𝘀𝗲 𝗰𝗮𝘀𝗲𝘀 Best setup I’ve used: integration tests with Testcontainers. If it runs in Docker, you can spin up real dependencies locally and in CI without pain. Want to set it up from scratch? Here's a complete from-scratch guide: https://lnkd.in/dAfw5dtM What’s your testing setup? --- Do you want to simplify your development process? Grab my Clean Architecture template here and save 7 days of development time: https://lnkd.in/dYNsNb52
আর কোনও আগের কনটেন্ট নেই

আর কোনও পরবর্তী সামগ্রী নেই
৫৫ মন্তব্যসমূহ
লাইক কমেন্ট করুন
Ahmed Elbayomi

Site Manager at Trevi S.p.A | Ground Improvement specialist

৬,৪৬৩ জন ফলোয়ার ১ বছর
এই পোস্টটি রিপোর্ট করুন
#Soil investigation doesn’t end in the field—once samples are retrieved from boreholes, the real detective work begins in the laboratory. Lab testing gives engineers the quantitative properties needed to evaluate soil behavior and design safe, cost-effective foundations. 1. Atterberg Limits Test -Tests: Liquid Limit (LL), Plastic Limit (PL), and Plasticity Index (PI) -Purpose: Determines fine-grained soils' consistency, plasticity, and behavior (clays and silts). -Benefit: Helps classify soil types (CL, CH, etc.) and predict shrink/swell potential. Video:https://lnkd.in/dWdfN4kA 2. Grain Size Distribution (Sieve and Hydrometer Analysis) -Tests: Mechanical Sieve (for sands and gravels), Hydrometer (for silts and clays) -Purpose: Measures the percentage of different particle sizes in the soil. -Benefit: Critical for soil classification (e.g., GP, SM, CL) and assessing permeability. Video:https://lnkd.in/dE_93UFf 3. Standard Proctor and Modified Proctor Compaction Tests -Purpose: Determines the optimum moisture content and maximum dry density for soil compaction. -Benefit: Vital for earthworks, roadbeds, and embankment design—ensures proper field compaction. Video:https://lnkd.in/drii_FCm 4. Unconfined Compressive Strength (UCS) Test -Purpose: Measures the compressive strength of cohesive soils (especially clay). -Benefit: Provides a quick measure of shear strength,used in stability and bearing capacity calculations. Video: https://lnkd.in/ddUxHSXk 5. Triaxial Shear Test (UU, CU, CD) -Purpose: Simulates field stress conditions to measure shear strength under various drainage conditions. -Benefit: Offers more accurate strength parameters (ϕ and c) for slope stability and foundation design. Video:https://lnkd.in/d9aFgn29 6. Consolidation Test (Oedometer Test) -Purpose: Measures the settlement behavior of soil under long-term loading. -Benefit: Predicts how much and how fast the soil will compress under foundation loads—essential for buildings, tanks, and bridges. Video:https://lnkd.in/dRQRJVkA 7. Permeability Test -Tests: Constant Head (for coarse soils), Falling Head (for fine soils) -Purpose: Measures the rate at which water flows through soil. -Benefit: Crucial for drainage design, retaining structures, and seepage control. Video:https://lnkd.in/dhKe9XtV 8. Specific Gravity Test -Purpose: Measures the ratio of the unit weight of soil solids to that of water. -Benefit: Important in calculating void ratio, porosity, and degree of saturation Video:https://lnkd.in/dHeH7azw 9. Chemical Testing (pH, Sulfate, Chloride Content, Organic Matter) -Purpose: Identifies aggressive soil conditions. -Benefit: Protects foundations and underground utilities from chemical attack and corrosion. Video:https://lnkd.in/d2Yzc43y #SoilInvestigation #LabTesting
আর কোনও আগের কনটেন্ট নেই

আর কোনও পরবর্তী সামগ্রী নেই
৩৫ মন্তব্যসমূহ
লাইক কমেন্ট করুন
Sougata Bhattacharjee

Samsung (SSIR) | Ex - Intel | 6 times TEDx Speaker | ASIC Verification | Proficient in SV, UVM, OVM, SVA, Verilog | Keynote Speaker at Engineering Colleges (IITs/NITs) | Paper publication at VLSI Conferences

৫৫,৪৮০ জন ফলোয়ার ১ বছর
এই পোস্টটি রিপোর্ট করুন
During the initial phase of my career in VLSI, I realised that writing Testcases is equally important as Testbench development. A Testcase in any language be it Verilog, VHDL, SystemVerilog, and UVM is not only used to verify the functional correctness and the integrity of the design but also point out areas where the Testbench could be improved. Below are the most important category of Testcases which are most critical: [1] Functional Tests --> In this type of test, the functionality or feature of an IP/module or a subsystem is verified. [2] Register-based Tests --> RW Tests, RO/WO Tests, Default Read/Hard reset Tests, Soft reset tests, Negative RO/WO Tests, Aliasing, Broadcasting, etc [3] Connectivity Tests [4] Clock and Reset Tests [5] Boot up Tests, wake up sequence, training sequence tests. For eg. In the case of DDR – MPC Training, RD DQ Calibration, Command Bus training, Write leveling, etc [6] Command and Sequence-based Tests. [7] Overlapping and Unallocated Region tests. [8] Back-to-back data transfer-based tests. [9] UPF Tests --> Power domain, Level Shifter, clock gating, voltage domain, etc [10] Code Coverage Tests --> In this test toggle, expression, branch, FSM, and conditional coverage holes are measured, and depending on the holes, tests are being written to completely exercise the DUT. [11] Functional Coverage Tests --> In these types of test categories, the functionality of DUT is being measured with the help of bins. There are several ways to do it. If there are coverage holes, more bins are coded to cover those areas, complex scenarios are covered with cross coverage, and bins of intersect functionality. [12] Assertions are basically a check against the design. Basically, these are insertion points within the design which improve the observability and debugging ability. The above are some of the categorizations of tests that need to be applied while checking a design but to achieve all the above features, testcases are broadly classified into the following two types: [1] Directed Testcase: These are the scenarios that the verification engineers can think of or can anticipate. [2] RandomTestcase: These are the scenarios where the maximum amount of bugs can be caught. The random seeds will hit many different use cases which can not be anticipated earlier and has the probability to catch the design issues. Ideally, random tests can be classified into the following two categories: [1] Corner cases --> This is the bug that is only possible to catch when many different scenarios are processed together or they overlap and the best way to catch this type of scenario is to run more repeated regression with more seeds. [2] Stress testing -->These types of tests are useful to check the performance and the scalability of the DUT under multiple concurrent activities and unpredictable scenarios. #vlsi #asic #electricalengineering #semiconductorindustry
আর কোনও আগের কনটেন্ট নেই

আর কোনও পরবর্তী সামগ্রী নেই
৩ মন্তব্যসমূহ
লাইক কমেন্ট করুন
Alon Bochman

১২,৬৯৫ জন ফলোয়ার ১ বছর
এই পোস্টটি রিপোর্ট করুন
𝗜𝘁 𝘄𝗼𝗿𝗸𝗲𝗱 𝗶𝗻 𝘁𝗲𝘀𝘁𝗶𝗻𝗴. 𝗨𝗻𝘁𝗶𝗹 𝗶𝘁 𝗱𝗶𝗱𝗻’𝘁. The model passed every unit test. The UI felt smooth. The demo? Flawless. Then users showed up. And suddenly... → Retrieval returned the wrong context → The model started responding like it had amnesia → Edge-case prompts broke the flow → Confidence quietly slipped No big crashes. No obvious bugs. Just a slow unraveling of trust. Most AI products don’t fail because they’re broken. They fail because they weren’t tested the way real people actually use them. And that’s the trap. You test clean. They use messy. That’s why we built RagMetrics — to help AI teams test beyond the “happy path.” ✅ Real-world prompts ✅ Retrieval stress tests ✅ Output consistency checks ✅ Actual behavior under pressure — not assumptions Because fixing it after launch is expensive. But losing trust? That’s even harder to repair. If you’ve ever shipped something that “worked”… until a real user touched it — you know exactly what I’m talking about. Let’s not find out the hard way. 🧠 How are you testing your LLM in the wild? Would love to hear what’s working for you.

১৬ মন্তব্যসমূহ
লাইক কমেন্ট করুন
Ryan Mitchell

O'Reilly / Wiley Author | LinkedIn Learning Instructor | Principal Software Engineer @ GLG

৩০,৮৯০ জন ফলোয়ার ১ বছর
এই পোস্টটি রিপোর্ট করুন
LLMs are great for data processing, but using new techniques doesn't mean you get to abandon old best practices. The precision and accuracy of LLMs still need to be monitored and maintained, just like with any other AI model. Tips for maintaining accuracy and precision with LLMs: • Define within your team EXACTLY what the desired output looks like. Any area of ambiguity should be resolved with a concrete answer. Even if the business "doesn't care," you should define a behavior. Letting the LLM make these decisions for you leads to high variance/low precision models that are difficult to monitor. • Understand that the most gorgeously-written, seemingly clear and concise prompts can still produce trash. LLMs are not people and don't follow directions like people do. You have to test your prompts over and over and over, no matter how good they look. • Make small prompt changes and carefully monitor each change. Changes should be version tracked and vetted by other developers. • A small change in one part of the prompt can cause seemingly-unrelated regressions (again, LLMs are not people). Regression tests are essential for EVERY change. Organize a list of test case inputs, including those that demonstrate previously-fixed bugs and test your prompt against them. • Test cases should include "controls" where the prompt has historically performed well. Any change to the control output should be studied and any incorrect change is a test failure. • Regression tests should have a single documented bug and clearly-defined success/failure metrics. "If the output contains A, then pass. If output contains B, then fail." This makes it easy to quickly mark regression tests as pass/fail (ideally, automating this process). If a different failure/bug is noted, then it should still be fixed, but separately, and pulled out into a separate test. Any other tips for working with LLMs and data processing?

৪ মন্তব্যসমূহ
লাইক কমেন্ট করুন
Ross Dawson Ross Dawson একজন প্রভাবশালী

Futurist | Board advisor | Global keynote speaker | Founder: AHT Group - Informivity - Bondi Innovation | Humans + AI Leader | Bestselling author | Podcaster | LinkedIn Top Voice

৩৬,৩৫৮ জন ফলোয়ার ১ বছর
এই পোস্টটি রিপোর্ট করুন
Small variations in prompts can lead to very different LLM responses. Research that measures LLM prompt sensitivity uncovers what matters, and the strategies to get the best outcomes. A new framework for prompt sensitivity, ProSA, shows that response robustness increases with factors including higher model confidence, few-shot examples, and larger model size. Some strategies you should consider given these findings: 💡 Understand Prompt Sensitivity and Test Variability: LLMs can produce different responses with minor rephrasings of the same prompt. Testing multiple prompt versions is essential, as even small wording adjustments can significantly impact the outcome. Organizations may benefit from creating a library of proven prompts, noting which styles perform best for different types of queries. 🧩 Integrate Few-Shot Examples for Consistency: Including few-shot examples (demonstrative samples within prompts) enhances the stability of responses, especially in larger models. For complex or high-priority tasks, adding a few-shot structure can reduce prompt sensitivity. Standardizing few-shot examples in key prompts across the organization helps ensure consistent output. 🧠 Match Prompt Style to Task Complexity: Different tasks benefit from different prompt strategies. Knowledge-based tasks like basic Q&A are generally less sensitive to prompt variations than complex, reasoning-heavy tasks, such as coding or creative requests. For these complex tasks, using structured, example-rich prompts can improve response reliability. 📈 Use Decoding Confidence as a Quality Check: High decoding confidence—the model’s level of certainty in its responses—indicates robustness against prompt variations. Organizations can track confidence scores to flag low-confidence responses and identify prompts that might need adjustment, enhancing the overall quality of outputs. 📜 Standardize Prompt Templates for Reliability: Simple, standardized templates reduce prompt sensitivity across users and tasks. For frequent or critical applications, well-designed, straightforward prompt templates minimize variability in responses. Organizations should consider a “best-practices” prompt set that can be shared across teams to ensure reliable outcomes. 🔄 Regularly Review and Optimize Prompts: As LLMs evolve, so may prompt performance. Routine prompt evaluations help organizations adapt to model changes and maintain high-quality, reliable responses over time. Regularly revisiting and refining key prompts ensures they stay aligned with the latest LLM behavior. Link to paper in comments.
আর কোনও আগের কনটেন্ট নেই

আর কোনও পরবর্তী সামগ্রী নেই
৭ মন্তব্যসমূহ
লাইক কমেন্ট করুন
Sahar Mor

I help researchers and builders make sense of AI | ex-Stripe | aitidbits.ai | Angel Investor

৪২,১৮৩ জন ফলোয়ার ১ বছর
এই পোস্টটি রিপোর্ট করুন
A new study shows that even the best financial LLMs hallucinate 41% of the time when faced with unexpected inputs. FailSafeQA, a new benchmark from Writer, tests LLM robustness in finance by simulating real-world mishaps, including misspelled queries, incomplete questions, irrelevant documents, and OCR-induced errors. Evaluating 24 top models revealed that: * OpenAI’s o3-mini, the most robust, hallucinated in 41% of perturbed cases * Palmyra-Fin-128k-Instruct, the model best at refusing irrelevant queries, still struggled 17% of the time FailSafeQA uniquely measures: (1) Robustness - performance across query perturbations (e.g., misspelled, incomplete) (2) Context Grounding - the ability to avoid hallucinations when context is missing or irrelevant (3) Compliance - balancing robustness and grounding to minimize false responses Developers building financial applications should implement explicit error handling that gracefully addresses context issues, rather than solely relying on model robustness. Developing systems to proactively detect and respond to problematic queries can significantly reduce costly hallucinations and enhance trust in LLM-powered financial apps. Benchmark details https://lnkd.in/gq-mijcD
আর কোনও আগের কনটেন্ট নেই

আর কোনও পরবর্তী সামগ্রী নেই
২ মন্তব্যসমূহ
লাইক কমেন্ট করুন

LinkedIn আপনার গোপনীয়তাকে সম্মান করে

Software Testing Basics

বিভাগগুলি অন্বেষণ করুন

Software Testing Basics

Software Testing Basics বিষয়ে আরও

আরও Technology সংক্রান্ত বিষয়

বিভাগগুলি অন্বেষণ করুন